Javascript regex insensitive turkish character issue - Stack Overflow

i'm using regex for filtering some contents.var word = new RegExp(filterWord,"gi"); &q

i'm using regex for filtering some contents.

var word = new RegExp(filterWord,"gi");// "gi" means Global and insensitive
content = content.replace(word, "");//removes "word" from content

This code works properly but when regex get uppercase "İ" it dont replace word.

ex: if

filterWord = istanbul 

and

content = "İstanbul";

Above code not working properly , if i write istanbul to İstanbul ,it is working but this time it is not insensitive , how can i solve this problem ?

i'm using regex for filtering some contents.

var word = new RegExp(filterWord,"gi");// "gi" means Global and insensitive
content = content.replace(word, "");//removes "word" from content

This code works properly but when regex get uppercase "İ" it dont replace word.

ex: if

filterWord = istanbul 

and

content = "İstanbul";

Above code not working properly , if i write istanbul to İstanbul ,it is working but this time it is not insensitive , how can i solve this problem ?

Share Improve this question edited May 31, 2014 at 2:28 dereli 1,86415 silver badges23 bronze badges asked May 30, 2014 at 23:47 ErdiErdi 1,8943 gold badges21 silver badges32 bronze badges
Add a ment  | 

3 Answers 3

Reset to default 3

you can express lower and upper cases in a bracket

/[İi]stanbul/i

you can see from here

How regEx works with Small-Case and Upper-Case chars is based on the Hex-Code of the characters and how they are represented in Unicode consortium of that Unicode set(any language, I hope so as Unicode are based on International Standards).

eg: For English

Similarly, we have

Above are some highlighted characters with same colors are Upper and Small Case representation of their own and there is only one difference in their Hex-code. for Ê Hex-Code is 00CA and for ê is 00EA with one diffrence C and E at third position.

Similarly for Ý and ý Hex-Code is 00DD and u00FD with one difference D and F

Now check this eg:

'ÊÌÝêìý'.match(/Ì/gi) //case insensitive
//output ["Ì", "ì"]
'ÊÌÝêìý'.match(/Ì/g) //case sensitive
//output ["Ì"]

'ÊÌÝêìý'.match(/Ý/ig) //case insensitive
//output ["Ý", "ý"]
'ÊÌÝêìý'.match(/Ý/g) //case sensitive
//output ["Ý"]

If you are using right Characters then it should work normally. I don't know much about Latin-Turkish Characters.

This is subject of Unicode characters.

What happens is that i in your example is not a single letter but 2 because the tilde counts as a character as well. This brings lots of plexities and rules that needs to be followed in order to meet Unicode rules.

You could do something like: ([\x{0049}-\x{0130}]) to meet your i needs but this expression may vary depending if you are going to use this expression on , java, javascript or php.

*Online Demo*

You could also check what code each character represents here:

http://www.fileformat.info/info/unicode/char/search.htm?q=%C4%B0&preview=entity

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745318271a4622319.html

相关推荐

  • Javascript regex insensitive turkish character issue - Stack Overflow

    i'm using regex for filtering some contents.var word = new RegExp(filterWord,"gi"); &q

    6小时前
    10

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信