javascript - How to generate url slug from chinese characters? - Stack Overflow

Normally for generating url slug I use .js library - and exactly slugify method. However it removes all

Normally for generating url slug I use .js library - and exactly slugify method. However it removes all chinese characters. As a workaround I use following function:

var slugify = function(str){
   str = str.replace(/\s+/g,'-') // replace spaces with dashes
   str = encodeURIComponent(str) // encode (it encodes chinese characters)
   return str
}

So for input 中文 标题 I get %E4%B8%AD%E6%96%87-%E6%A0%87%E9%A2%98 and it looks like this in web browser url input box (and it works):


However I want to also remove any special characters like !@#$%^&*) etc. The problem is that string.js library is using following piece of code internally:

.replace(/[^\w\s-]/g

And it removes any special characters, BUT ALSO removes chinese characters as they don't match with \w regexp...

So my question is - how to modify above regexp so make it keep chinese characters?


I tried

replace(/[^a-zA-Z0-9_\s-\u3400-\u9FBF]/g,'')

But it still replaces chinese characters...

Normally for generating url slug I use https://github./jprichardson/string.js library - and exactly slugify method. However it removes all chinese characters. As a workaround I use following function:

var slugify = function(str){
   str = str.replace(/\s+/g,'-') // replace spaces with dashes
   str = encodeURIComponent(str) // encode (it encodes chinese characters)
   return str
}

So for input 中文 标题 I get %E4%B8%AD%E6%96%87-%E6%A0%87%E9%A2%98 and it looks like this in web browser url input box (and it works):

http://example./中文-标题

However I want to also remove any special characters like !@#$%^&*) etc. The problem is that string.js library is using following piece of code internally:

.replace(/[^\w\s-]/g

And it removes any special characters, BUT ALSO removes chinese characters as they don't match with \w regexp...

So my question is - how to modify above regexp so make it keep chinese characters?


I tried

replace(/[^a-zA-Z0-9_\s-\u3400-\u9FBF]/g,'')

But it still replaces chinese characters...

Share Improve this question edited Sep 6, 2014 at 9:32 user606521 asked Sep 6, 2014 at 9:18 user606521user606521 15.6k36 gold badges125 silver badges229 bronze badges
Add a ment  | 

3 Answers 3

Reset to default 3

If you want to match (or exclude) the dash - character in a set of characters (with square brackets), you have to put it in the end.

Your regexp matches characters that are not

  • in the range a-z
  • in the range A-Z
  • in the range 0-9
  • _
  • in the range \s-\u3400 that's your problem
  • -
  • \u9FBF

You want to do:

replace(/[^a-zA-Z0-9_\u3400-\u9FBF\s-]/g,'')

You can try uslug, which slugify 汉语/漢語 to 汉语漢語

If you want to transform Chinese characters to Pinyin, try transliteration

do a positive match list:

  replace(/[\!@#\$%^&\*\)]/g,'')

Anyway I would consider to take URL meta chars out of that:

   replace(/[\!@\$\^\*\)]/g,'')

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745418537a4626863.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信