I want to find in a math expression elements that are not wrapped between {
and }
Examples:
Input:
abc+1*def
Matches:["abc", "1", "def"]
Input:
{abc}+1+def
Matches:["1", "def"]
Input:
abc+(1+def)
Matches:["abc", "1", "def"]
Input:
abc+(1+{def})
Matches:["abc", "1"]
Input:
abc def+(1.1+{ghi})
Matches:["abc def", "1.1"]
Input:
1.1-{abc def}
Matches:["1.1"]
Rules
- The expression is well-formed. (So there won't be start parenthesis without closing parenthesis or starting
{
without}
) - The math symbols allowed in the expression are
+
-
/
*
and(
)
- Numbers could be decimals.
- Variables could contains spaces.
- Only one level of
{
}
(no nested brackets)
So far, I ended with:
(^[^/*+({})-]+|(?:[/*+({})-])[^/*+({})-]+(?:[/*+({})-])|[^/*+({})-]+$)
I split the task into 3:
- match elements at the beginning of the string
- match elements that are between two { and }
- match elements at the end of the string
But it doesn't work as expected.
Any idea ?
I want to find in a math expression elements that are not wrapped between {
and }
Examples:
Input:
abc+1*def
Matches:["abc", "1", "def"]
Input:
{abc}+1+def
Matches:["1", "def"]
Input:
abc+(1+def)
Matches:["abc", "1", "def"]
Input:
abc+(1+{def})
Matches:["abc", "1"]
Input:
abc def+(1.1+{ghi})
Matches:["abc def", "1.1"]
Input:
1.1-{abc def}
Matches:["1.1"]
Rules
- The expression is well-formed. (So there won't be start parenthesis without closing parenthesis or starting
{
without}
) - The math symbols allowed in the expression are
+
-
/
*
and(
)
- Numbers could be decimals.
- Variables could contains spaces.
- Only one level of
{
}
(no nested brackets)
So far, I ended with: http://regex101./r/gU0dO4
(^[^/*+({})-]+|(?:[/*+({})-])[^/*+({})-]+(?:[/*+({})-])|[^/*+({})-]+$)
I split the task into 3:
- match elements at the beginning of the string
- match elements that are between two { and }
- match elements at the end of the string
But it doesn't work as expected.
Any idea ?
Share Improve this question edited May 14, 2014 at 9:41 fluminis asked May 14, 2014 at 8:28 fluminisfluminis 4,1394 gold badges36 silver badges48 bronze badges 6- 2 Consider going the other way as the first (and independent) step - replace all "{..}" with "". – user2864740 Commented May 14, 2014 at 8:31
- 1 Using regex to validate parenthesis pairs is already well plicated, for the sanity of the maintainer consider alternatives to doing it in regex. – Theraot Commented May 14, 2014 at 8:35
- 3 I do not want to validate the expression, I know for sure that the expression is valid. I just want to get the tokens that are not wrapped between { and }. – fluminis Commented May 14, 2014 at 8:38
- Do you have only one level of { } or can you have deep nesting (which changes everything and would require a parsing) ? – Denys Séguret Commented May 14, 2014 at 8:55
- 1 This really changes everything... – Denys Séguret Commented May 14, 2014 at 9:06
4 Answers
Reset to default 3Matching {}
s, especially nested ones is hard (read impossible) for a standard regular expression, since it requires counting the number of {
s you encountered so you know which }
terminated it.
Instead, a simple string manipulation method could work, this is a very basic parser that just reads the string left to right and consumes it when outside of parentheses.
var input = "abc def+(1.1+{ghi})"; // I assume well formed, as well as no precedence
var inParens = false;
var output = [], buffer = "", parenCount = 0;
for(var i = 0; i < input.length; i++){
if(!inParens){
if(input[i] === "{"){
inParens = true;
parenCount++;
} else if (["+","-","(",")","/","*"].some(function(x){
return x === input[i];
})){ // got symbol
if(buffer!==""){ // buffer has stuff to add to input
output.push(buffer); // add the last symbol
buffer = "";
}
} else { // letter or number
buffer += input[i]; // push to buffer
}
} else { // inParens is true
if(input[i] === "{") parenCount++;
if(input[i] === "}") parenCount--;
if(parenCount === 0) inParens = false; // consume again
}
}
This might be an interesting regexp challenge, but in the real world you'd be much better off simply finding all [^+/*()-]+
groups and removing those enclosed in {}
's
"abc def+(1.1+{ghi})".match(/[^+/*()-]+/g).filter(
function(x) { return !/^{.+?}$/.test(x) })
// ["abc def", "1.1"]
That being said, regexes is not a correct way to parse math expressions. For serious parsing, consider using formal grammars and parsers. There are plenty of parser generators for javascript, for example, in PEG.js you can write a grammar like
expr
= left:multiplicative "+" expr
/ multiplicative
multiplicative
= left:primary "*" right:multiplicative
/ primary
primary
= atom
/ "{" expr "}"
/ "(" expr ")"
atom = number / word
number = n:[0-9.]+ { return parseFloat(n.join("")) }
word = w:[a-zA-Z ]+ { return w.join("") }
and generate a parser which will be able to turn
abc def+(1.1+{ghi})
into
[
"abc def",
"+",
[
"(",
[
1.1,
"+",
[
"{",
"ghi",
"}"
]
],
")"
]
]
Then you can iterate this array just normally and fetch the parts you're interested in.
The variable names you mentioned can be match by \b[\w.]+\b
since they are strictly bounded by word separators
Since you have well formed formulas, the names you don't want to capture are strictly followed by }
, therefore you can use a lookahead expression to exclude these :
(\b[\w.]+ \b)(?!})
Will match the required elements (http://regexr./38rch).
Edit:
For more plex uses like correctly matching :
- abc {def{}}
- abc def+(1.1+{g{h}i})
We need to change the lookahead term to (?|({|}))
To include the match of 1.2-{abc def}
we need to change the \b
1. This term is using lookaround expression which are not available in javascript. So we have to work around.
(?:^|[^a-zA-Z0-9. ])([a-zA-Z0-9. ]+(?=[^0-9A-Za-z. ]))(?!({|}))
Seems to be a good one for our examples (http://regex101./r/oH7dO1).
1 \b
is the separation between a \w
and a \W
\z
or \a
. Since \w
does not include space and \W
does, it is inpatible with the definition of our variable names.
Going forward with user2864740's ment, you can replace all things between {}
with empty and then match the remaining.
var matches = "string here".replace(/{.+?}/g,"").match(/\b[\w. ]+\b/g);
Since you know that expressions are valid, just select \w+
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744843024a4596672.html
评论列表(0条)