javascript - Regex: Disable Symbols - Stack Overflow

Is there any way to disable all symbols, punctuations, block elements, geometric shapes and dingbats su

Is there any way to disable all symbols, punctuations, block elements, geometric shapes and dingbats such like these:

✁ ✂ ✃ ✄ ✆ ✇ ✈ ✉ ✌ ✍ ✎ ✏ ✐ ✑ ✒ ✓ ✔ ✕ ⟻ ⟼ ⟽ ⟾ ⟿ ⟻ ⟼ ⟽ ⟾ ⟿ ▚ ▛ ▜ ▝ ▞ ▟

without writing down all of them in the Regular Expression Pattern, while enable all other normal language characters such like chinese, arabic etc.. such like these:

文化中国 الجزيرة نت

?

I'm building a javascript validation function and my real problem is that I can't use:

[a-zA-Z0-9] 

Because this ignores a lots of languages too not just the symbols.

Is there any way to disable all symbols, punctuations, block elements, geometric shapes and dingbats such like these:

✁ ✂ ✃ ✄ ✆ ✇ ✈ ✉ ✌ ✍ ✎ ✏ ✐ ✑ ✒ ✓ ✔ ✕ ⟻ ⟼ ⟽ ⟾ ⟿ ⟻ ⟼ ⟽ ⟾ ⟿ ▚ ▛ ▜ ▝ ▞ ▟

without writing down all of them in the Regular Expression Pattern, while enable all other normal language characters such like chinese, arabic etc.. such like these:

文化中国 الجزيرة نت

?

I'm building a javascript validation function and my real problem is that I can't use:

[a-zA-Z0-9] 

Because this ignores a lots of languages too not just the symbols.

Share Improve this question edited Jan 18, 2011 at 19:27 Adam Halasz asked Jan 18, 2011 at 18:18 Adam HalaszAdam Halasz 58.4k67 gold badges153 silver badges216 bronze badges 5
  • Sounds more like a character-by-character filter task than a regex task! – Cascabel Commented Jan 18, 2011 at 18:21
  • 1 can you give a reason? theres probably a better way to go about this. – greggreg Commented Jan 18, 2011 at 18:23
  • regex is definitely not the solution for this problem – Ben Lee Commented Jan 18, 2011 at 18:25
  • The normal way to do this is by using Unicode properties. If you have access to those, it’s simplicity itself; if you don’t, it’s tantamount to impossible. Last I looked javascript didn’t give you access to Unicode properties. – tchrist Commented Jan 18, 2011 at 21:02
  • I've added a note below about XRegExp, which may help you, if you take advantage of the Unicode Plugin. xregexp./plugins – JasonTrue Commented Jan 21, 2011 at 18:51
Add a ment  | 

5 Answers 5

Reset to default 5

The Unicode standard divides up all the possible characters into code charts. Each code chart contains related characters. If you want to exclude (or include) only certain classes of characters, you will have to make a suitable list of exclusions (or inclusions). Unicode is big, so this might be a lot of work.

Not really.

JavaScript doesn't support Unicode Character Properties. The closest you'll get is excluding ranges by Unicode code point as Greg Hewgill suggested.

For example, to match all of the characters under Mathematical Symbols:

/[\u2190-\u259F]/

This depends on your regex dialect. Unfortunately, probably most existing JavaScript engines don't support Unicode character classes.

In regex engines such as the one in (recent) Perl or .Net, Unicode character classes can be referenced.

\p{L}: any kind of letter from any language. \p{N}: any number symbol from any language (including, as I recall, the Indian and Arabic and CJK number glyphs).

Because Unicode supports posed and deposed glyphs, you may run into certain plexities: namely, if only deposed forms exist, it's possible that you might accidentally exclude some diacritic marks in your matching pattern, and you may need to explicitly allow glyphs of the type Mark. You can mitigate this somewhat by using, if I recall correctly, a string that has been normalized using kC normalization (only for characters that have a posed form). In environments that support Unicode well, there's usually a function that allows you to normalize Unicode strings fairly easily (true in Java and .Net, at least).

Edited to add: If you've started down this path, or have considered it, in order to regain some sanity, you may want to experiment with the Unicode Plugin for XRegExp (which will require you to take a dependency on XRegExp).

JavaScript regular expressions do not have native Unicode support. An alternative to to validate (or sanitize) the string at server site, or to use a non-native regex library. While I've never used it, XRegExp is such a library, and it has a Unicode Plugin.

Take a look at the Unicode Planes. You probably want to exclude everything but planes 0 and 2. After that, it gets ugly as you'll have to exclude a lot of plane 0 on a case-by-case basis.

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744452099a4574913.html

相关推荐

  • javascript - Regex: Disable Symbols - Stack Overflow

    Is there any way to disable all symbols, punctuations, block elements, geometric shapes and dingbats su

    1天前
    20

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信