javascript - Regular Expression required - Stack Overflow

I need to write a regular expression that should allow standard characters [a-zA-Z0-9] plus one space b

I need to write a regular expression that should allow standard characters [a-zA-Z0-9] plus one space between each word, the umlauts [äöüÄÖÜ], a dot (.) and a hyphen (-) and no other special characters.

For Example :

Following should be allowed:

Dr. Aaryan Joshi    
Phill Rozer MSc.    
Ajay Verma5    
Rajan-Verma MSc.

And following should not be allowed:

Ajay     Verma
Dr. Ajay. Verma.
Test Name.-.Name2

I need to write a regular expression that should allow standard characters [a-zA-Z0-9] plus one space between each word, the umlauts [äöüÄÖÜ], a dot (.) and a hyphen (-) and no other special characters.

For Example :

Following should be allowed:

Dr. Aaryan Joshi    
Phill Rozer MSc.    
Ajay Verma5    
Rajan-Verma MSc.

And following should not be allowed:

Ajay     Verma
Dr. Ajay. Verma.
Test Name.-.Name2
Share Improve this question edited Feb 11, 2012 at 9:43 Alan Moore 75.3k13 gold badges107 silver badges161 bronze badges asked Nov 7, 2011 at 7:50 PushpendraPushpendra 4,3925 gold badges38 silver badges64 bronze badges 14
  • can the string started with a number character? – You Qi Commented Nov 7, 2011 at 7:53
  • 4 And what have you tried yourself? – KooiInc Commented Nov 7, 2011 at 7:53
  • @You Qi Choong : Yes it can start with numbers also – Pushpendra Commented Nov 7, 2011 at 7:56
  • 1 What make this Dr. Aaryan Joshi and Phill Rozer MSc. a valid string but not this Dr. Ajay. Verma.? – You Qi Commented Nov 7, 2011 at 7:57
  • 1 Would be a whole level easier if you specified precisely what you want to achieve, I think. – Kos Commented Feb 12, 2012 at 16:02
 |  Show 9 more ments

8 Answers 8

Reset to default 7

Try this

^(?! )(?!.* $)(?=[^.]*\.?[^.]*$)(?=[^-]*-?[^-]*$)(?!.*? {2,})[a-zA-Z0-9äöüÄÖÜ .-]+$

See it here on Regexr

^ and $ are anchors that matches the start and the end of the string

[a-zA-Z0-9äöüÄÖÜ .-]+ is a character class with a quantifier (+ => 1 or more) this part matches all characters you want to allow.

The (?!) and (?=) are negative/positive look aheads. They verify the conditions you set.

(?! ) Does not start with a space
(?!.* $) Does not end with a space
(?=[^.]*\.[^.]*$) Only one dot allowed, anywhere in the string
(?=[^-]*-[^-]*$) Only one dash allowed, anywhere in the string
(?!.*? +) more than one space in sequence is not allowed

Sometimes it's easier to specify what's not allowed. The following finds an unapproved character, two consecutive spaces, and more than 2 dots or hyphens.

([^a-zA-Z0-9äöüÄÖÜ. -]|  |\..*\.|-.*-)

If spaces are not allowed at the beginning or end of string you can use:

([^a-zA-Z0-9äöüÄÖÜ. -]|^ | $|  |\..*\.|-.*-)

My solution is this:

^(([a-zA-Z0-9äöüÄÖÜ]+(-[a-zA-Z0-9äöüÄÖÜ]+)?\s)*([a-zA-Z0-9äöüÄÖÜ]*\.)|
(\.[a-zA-Z0-9äöüÄÖÜ]+))|((([a-zA-Z0-9äöüÄÖÜ]*\.)|(\.[a-zA-Z0-9äöüÄÖÜ]+)\s)?
([a-zA-Z0-9äöüÄÖÜ]+(-[a-zA-Z0-9äöüÄÖÜ]+)?\s)*[a-zA-Z0-9äöüÄÖÜ]+
(-[a-zA-Z0-9äöüÄÖÜ]+)?)$

Breaking it down

A dot-free word with hyphen matches: Its assumed that a hyphen, if present, is not allowed at the beginning or the end. If this assumption is wrong, it is easy enough to adjust accordingly.

[a-zA-Z0-9äöüÄÖÜ]+-[a-zA-Z0-9äöüÄÖÜ]+

A dot-free word without hyphen matches:

[a-zA-Z0-9äöüÄÖÜ]+ 

Thus a dot-free word, with or without a single hyphen (and at most one hypen) matches:

[a-zA-Z0-9äöüÄÖÜ]+(-[a-zA-Z0-9äöüÄÖÜ]+)?

We can make use of a general design pattern (no pun intended), to get exactly one X with any number of Y:

(Y*X)|(XY+)

So applying this rule, a dotted word with exactly one dot matches:

([a-zA-Z0-9äöüÄÖÜ]*\.)|(\.[a-zA-Z0-9äöüÄÖÜ]+)

Similarly get a stream of words with exactly one dotted, we slightly modify the general rule for a space separator. So a stream of words with exactly one dotted matches:

  ((Y\s)*X)|(X\s(Y\s)*Y)

where: 1. Y = the regex for a dot-free word 2. X = the regex for a dotted word

Similarly a stream of only dot-free words would match:

(Y\s)*Y

where Y is as before.

Combining the two meta-regexes, a stream of words with at most one dotted word matches:

((Y\s)*X)|((X\s)?(Y\s)*Y)

where X and Y are as before.

The final step is to substitute X and Y back in the preceding meta-regex to yield my propoosed solution. The really nice thing about my solution is that it does not use look-aheads, so resolution is faster and it works on all flavours of regex, including even the very primitive flavours from XML Schema, XPATH and XSLT.

Add ^ and $ at the start and end if needed.

In standard regex, you can use the unicode character property classes to capture umlauts (ex. \p{Mn}).

Unfortunately this is not supported in JavaScript Regex, so you need to explicitly specify the characters you want to accept.

This will therefore do the trick for you

(?:[a-zA-Z\däöüÄÖÜ-]+(?:\.(?!.*\.))?[ ]{0,1})+

If you want to enforce just one hyphen change it to this

(?:[a-zA-Z\däöüÄÖÜ]+(?:\.(?!.*\.))?(?:-(?!.*-)|[ ]{0,1}))+

Explanation

I will break it down, using 'Dr. Aaryan Joshi' as an example.

Anything enclosed in [ ] is what we call a 'character capture group'. It means capture any of these characters.

For a start, disregard any '?:' for now.. leaving us with ([a-zA-Z\däöüÄÖÜ-]+(\.(?!.*.))?[ ]{0,1})+

So, with [a-zA-Z\däöüÄÖÜ-] we're saying:

  • Capture any word character (a-zA-Z0-9), any of these umlauts (äöüÄÖÜ) and any '-'

By adding a + we are saying '1 or more' of these.

This would only match 'Dr' since we are not yet accepting a period or a space character.

Next we add (\.(?!.*\.))? which means: - match any . (\.) which is not followed by another . (?!.(\.)) - we enclose this in brackets followed by a ? which means 'you don't have to match this always'.. i.e. there can or can 'not' be a '.', but if there is, make sure it's the only one.

Now we're matching 'Dr.' but we wouldn't match it if there was another '.' further down the line.

Next we add another character capture group for a space [ ] and use the {,} notation to indicate the 'bounds'. So [ ]{0,1} means 'match 0 or 1 space' (an alternative notation to this is using the ? symbol, ex. [ ]? or simply ' ?'.. but the {0,1} is more explicit.

This would now match 'Dr. '.

The last step is to indicate that we want to capture multiple instances of this. So we wrap it all up in brackets, and use the + to indicate that we want to capture '1 or more' of these.

Which would now match the whole string 'Dr. Aaryan Joshi'

As a final touch, we add '?:' to all the capture groups to indicate that we are only matching patterns and do not want to store a reference to the matched groups (saves memory :))

Further to @AlanMoore's ments, you can of course also add anchors to this regex like so

^(?:[a-zA-Z\däöüÄÖÜ]+(?:\.(?!.*\.))?(?:-(?!.*-)|[ ]{0,1}))+$

Should you want to set a minimum limit to the amount of characters accepted, then change the first plus to a bound.. ex {3,} to say '3 or more'.

Hope this helps :)

Note I've tested all these against the acceptance criteria you gave, and it matches all cases when using the JavaScript regex engine :)

Edit Swapped out \w to a-zA-Z\d since \w would also accept the _ character (Thank you @AlanMoore for pointing that out)

Try this (in JavaScript, to match your ment above)

<!DOCTYPE html>
<html>
    <head>
        <script type="text/javascript">
            // All "true"
            alert(test("Dr. Aaryan Joshi"));
            alert(test("Phill Rozer MSc."));
            alert(test("Ajay Verma5"));
            alert(test("Rajan-Verma MSc."));

            // All "false"
            alert(test("Ajay     Verma"));
            alert(test("Dr. Ajay. Verma."));
            alert(test("Test Name.-.Name2"));

            function test( name ) {
                var pattern = /^(?=[^.]*\.?[^.]*$)(?!.* $)([A-Za-z0-9äöüÄÖÜ.-]+( |$))+$/;
                /*
                 * (?=[^.]*\.?[^.]*$)
                 *  - Contains zero or one dots (.)
                 * (?!.* $)
                 * - Does not end with a space (as in stema's answer)
                 * ([A-Za-z0-9äöüÄÖÜ.-]+( |$))
                 * - Matches chars specified, ending with one space or end of string
                 * 
                 * (Whole pattern is anchored to start & end of string too)
                 */
                return pattern.test(name);
            }
        </script>
    </head>
    <body>

    </body>
</html>
/^(([\wäöüÄÖÜ]+|[\wäöüÄÖÜ]+\-[\wäöüÄÖÜ]+)\.?\s?)+$/.test(yourString)
/^(?=[^.]+(?:\.[^.]*)?$)(?=[^-]+(?:-[^-]*)?$)[A-Za-z0-9äöüÄÖÜ.-]+(?:[ ][A-Za-z0-9äöüÄÖÜ.-]+)*$/

These are the criteria as I understand them:

  1. Must start with a letter or a digit (including those accented letters).
  2. May have at most one dot ('.'), which can be anywhere but the beginning.
  3. May have at most one hyphen ('-'), ditto.
  4. May contain any number spaces, but they can't be at the beginning or end, and they must not be consecutive.
  5. May contain any number of alphanumeric characters from the set [A-Za-z0-9äöüÄÖÜ] as long as the other criteria are met. That is, there has to be at least one alphanumeric at beginning (rule 1) and at least one following each space, if there are any (rule 4).

And here's a breakdown of the regex:

^
(?=[^.]+(?:\.[^.]*)?$)      # at most one dot, not at the beginning
(?=[^-]+(?:-[^-]*)?$)       # at most one hyphen, ditto
[A-Za-z0-9äöüÄÖÜ.-]+        # first "word"
(?:
  [ ]                       # space presaging another "word"
  [A-Za-z0-9äöüÄÖÜ.-]+      # the next "word"
)*
$

this work for me

^[a-zA-Z0-9äöüÄÖÜ\-]+\.?.[a-zA-Z0-9äöüÄÖÜ\-]+(.[a-zA-Z0-9äöüÄÖÜ\-]+)?\.?$

remember use slash after and before in javascript

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744722487a4589991.html

相关推荐

  • javascript - Regular Expression required - Stack Overflow

    I need to write a regular expression that should allow standard characters [a-zA-Z0-9] plus one space b

    13小时前
    30

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信