Javascript regex & Japanese symbols - Stack Overflow

I use the search() method of the string object to find a match between a regular expression and a strin

I use the search() method of the string object to find a match between a regular expression and a string.

It works fine for English words:

"google".search(/\bg/g) // return 0

But this code doesn't work for Japanese strings:

"アイスランド語".search(/\bア/g) // return -1

How can I change the regex to find a match between Japanese strings and a regular expression?

I use the search() method of the string object to find a match between a regular expression and a string.

It works fine for English words:

"google".search(/\bg/g) // return 0

But this code doesn't work for Japanese strings:

"アイスランド語".search(/\bア/g) // return -1

How can I change the regex to find a match between Japanese strings and a regular expression?

Share Improve this question edited Oct 14, 2012 at 13:34 dda 6,2132 gold badges27 silver badges35 bronze badges asked Oct 26, 2011 at 8:51 AndreiAndrei 4,2173 gold badges29 silver badges32 bronze badges
Add a ment  | 

2 Answers 2

Reset to default 4

Sadly Javascript is an "ASCII only" regex. No Unicode is supported (I mean that the Unicode non-ASCII characters aren't "divided into classes". So \d is only 0-9 for example). If you need advanced regexes (Unicode regexes) in Javascript, you can try http://xregexp./

And we won't even delve in the problem of surrogate pairs. A character in Javascript is an UTF-16 point, so it isn't always a "full" Unicode character. Fortunately Japanese should entirely be in the BMP (but note that the Han unification is in the Plane 2, so each of those character is 2x UTF-16 characters)

If you want to read something about Unicode, you could start from the Wiki Mapping of Unicode characters for example.

The problem is the \b. As \b only matches:

  • Before the first character in the string, if the first character is a word character.
  • After the last character in the string, if the last character is a word character.
  • Between two characters in the string, where one is a word character and the other is not a word character.

(see: http://www.regular-expressions.info/wordboundaries.html)

And in JavaScript a word character is the character class [a-zA-Z0-9_] (ref / Word Boundaries / ECMA = ASCII).

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745288179a4620691.html

相关推荐

  • Javascript regex & Japanese symbols - Stack Overflow

    I use the search() method of the string object to find a match between a regular expression and a strin

    3小时前
    40

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信