javascript - Using parser combinator to parse simple math expression - Stack Overflow

I'm using parsimmon to parse a simple math expression and I'm failing to parse a simple math

I'm using parsimmon to parse a simple math expression and I'm failing to parse a simple math expression that follows order of operation (i.e. */ has higher precedence than +-).

Even if you are not familiar with this library please help me solve precedence problem without left recursion and infinite recursion.

Thank you.

I used TypeScript:

"use strict";

// Run me with Node to see my output!

import * as P from "parsimmon";
import {Parser} from "parsimmon";
// @ts-ignore
import util from "util";


///////////////////////////////////////////////////////////////////////

// Use the JSON standard's definition of whitespace rather than Parsimmon's.
let whitespace = P.regexp(/\s*/m);

// JSON is pretty relaxed about whitespace, so let's make it easy to ignore
// after most text.
function token(parser: Parser<string>) {
  return parser.skip(whitespace);
}

// Several parsers are just strings with optional whitespace.
function word(str: string) {
  return P.string(str).thru(token);
}

let MathParser = P.createLanguage({
  expr: r => P.alt(r.sExpr2, r.sExpr1, r.number),
  sExpr1: r => P.seqMap(r.iExpr, P.optWhitespace, r.plusOrMinus, P.optWhitespace, r.expr, (a, s1, b, s2, c) => [a, b, c]),
  sExpr2: r => P.seqMap(r.iExpr, P.optWhitespace, r.multiplyOrDivide, P.optWhitespace, r.expr, (a, s1, b, s2, c) => [a, b, c]),
  iExpr: r => P.alt(r.iExpr, r.number),    // Issue here! this causes infinite recursion
  // iExpr: r => r.number  // this will fix infinite recursion but yields invalid parse

  number: () =>
    token(P.regexp(/[0-9]+/))
      .map(Number)
      .desc("number"),

  plus: () => word("+"),
  minus: () => word("-"),
  plusOrMinus: r => P.alt(r.plus, r.minus),

  multiply: () => word("*"),
  divide: () => word("/"),
  multiplyOrDivide: r => P.alt(r.multiply, r.divide),

  operator: r => P.alt(r.plusOrMinus, r.multiplyOrDivide)
});

///////////////////////////////////////////////////////////////////////

let text = "3 / 4 - 5 * 6 + 5";

let ast = MathParser.expr.tryParse(text);
console.log(util.inspect(ast, {showHidden: false, depth: null}));

This is my repo

I'm using parsimmon to parse a simple math expression and I'm failing to parse a simple math expression that follows order of operation (i.e. */ has higher precedence than +-).

Even if you are not familiar with this library please help me solve precedence problem without left recursion and infinite recursion.

Thank you.

I used TypeScript:

"use strict";

// Run me with Node to see my output!

import * as P from "parsimmon";
import {Parser} from "parsimmon";
// @ts-ignore
import util from "util";


///////////////////////////////////////////////////////////////////////

// Use the JSON standard's definition of whitespace rather than Parsimmon's.
let whitespace = P.regexp(/\s*/m);

// JSON is pretty relaxed about whitespace, so let's make it easy to ignore
// after most text.
function token(parser: Parser<string>) {
  return parser.skip(whitespace);
}

// Several parsers are just strings with optional whitespace.
function word(str: string) {
  return P.string(str).thru(token);
}

let MathParser = P.createLanguage({
  expr: r => P.alt(r.sExpr2, r.sExpr1, r.number),
  sExpr1: r => P.seqMap(r.iExpr, P.optWhitespace, r.plusOrMinus, P.optWhitespace, r.expr, (a, s1, b, s2, c) => [a, b, c]),
  sExpr2: r => P.seqMap(r.iExpr, P.optWhitespace, r.multiplyOrDivide, P.optWhitespace, r.expr, (a, s1, b, s2, c) => [a, b, c]),
  iExpr: r => P.alt(r.iExpr, r.number),    // Issue here! this causes infinite recursion
  // iExpr: r => r.number  // this will fix infinite recursion but yields invalid parse

  number: () =>
    token(P.regexp(/[0-9]+/))
      .map(Number)
      .desc("number"),

  plus: () => word("+"),
  minus: () => word("-"),
  plusOrMinus: r => P.alt(r.plus, r.minus),

  multiply: () => word("*"),
  divide: () => word("/"),
  multiplyOrDivide: r => P.alt(r.multiply, r.divide),

  operator: r => P.alt(r.plusOrMinus, r.multiplyOrDivide)
});

///////////////////////////////////////////////////////////////////////

let text = "3 / 4 - 5 * 6 + 5";

let ast = MathParser.expr.tryParse(text);
console.log(util.inspect(ast, {showHidden: false, depth: null}));

This is my repo

Share Improve this question edited Dec 28, 2019 at 7:11 Node.JS asked Dec 28, 2019 at 7:05 Node.JSNode.JS 1,6347 gold badges55 silver badges131 bronze badges 6
  • One thing I forgot to mention is in P.alt order matters. So that's why I used r.sExpr2 and then r.sExpr1 – Node.JS Commented Dec 28, 2019 at 7:08
  • Sorry I had to fix a typo. My fix was iExpr: r => r.number which yields invalid parse tree – Node.JS Commented Dec 28, 2019 at 7:12
  • You have multiple circular references, expr is parsing sExpr1 that tries to map expr, also happens with sExpr2, and iExpr is triying to parse itself... Could you please explain what is are results you want to aplish? – ZeroWorks Commented Dec 30, 2019 at 18:01
  • I want to be able to parse simple Math expression: 1 + 2 / 3 + 4 * 5 – Node.JS Commented Dec 30, 2019 at 18:08
  • I meant what you expect to be the parsed result... why do you need r.sExpr2 and then r.sExpr1, since you are parsing a Math expression you need number + [plus|minus|multiply|divide], ignoring spaces... could you please clarify that please? – ZeroWorks Commented Dec 30, 2019 at 18:13
 |  Show 1 more ment

2 Answers 2

Reset to default 7 +50

Currently the grammar implemented by your parser looks like this (ignoring white space):

expr: sExpr2 | sExpr1 | number
sExpr1: iExpr plusOrMinus expr
sExpr2: iExpr multiplyOrDivide expr
// Version with infinite recursion:
iExpr: iExpr | number
// Version without infinite recursion:
iExpr: number

It's pretty easy to see that iExpr: iExpr is a left-recursive production and the cause of your infinite recursion. But even if Parsimmon could handle left-recursion, there just wouldn't be any point in that production. If it didn't mess up the parser, it just wouldn't do anything at all. Just like an equation x = x does not convey any information, a production x: x doesn't make a grammar match anything it didn't match before. It's basically a no-op, but one that breaks parsers that can't handle left recursion. So removing it is definitely the right solution.

With that fixed, you now get wrong parse trees. Specifically you'll get parse trees as if all your operators were of the same precedence and right-associative. Why? Because the left-side of all of your operators is iExpr and that can only match single numbers. So you'll always have leaves as the left child of an operator node and the tree always grows to the right.

An unambiguous grammar to correctly parse left-associative operators can be written like this:

expr: expr (plusOrMinus multExpr)?
multExpr: multExpr (multiplyOrDivide primaryExpr)?
primaryExpr: number | '(' expr ')'

(The | '(' expr ')' part is only needed if you want to allow parentheses, of course)

This would lead to the correct parse trees because there's no way for a multiplication or division to have an unparenthesized addition or subtraction as a child and if there are multiple applications of operators of the same precedence ,such as 1 - 2 - 3, the outer subtraction would contain the inner subtraction as its left child, correctly treating the operator as left-associative.

Now the problem is that this grammar is left recursive, so it's not going to work with Parsimmon. One's first thought might be to change the left recursion to right recursion like this:

expr: multExpr (plusOrMinus expr)?
multExpr: primaryExpr (multiplyOrDivide multExpr)?
primaryExpr: number | '(' expr ')'

But the problem with that is that now 1 - 2 - 3 wrongly associates to the right instead of the left. Instead, the mon solution is to remove the recursion altogether (except the one from primaryExpr back to expr of course) and replace it with repetition:

expr: multExpr (plusOrMinus multExpr)*
multExpr: primaryExpr (multiplyOrDivide primaryExpr)*
primaryExpr: number | '(' expr ')'

In Parsimmon you'd implement this using sepBy1. So now instead of having a left operand, an operator and a right operand, you have a left operand and then arbitrarily many operator-operand pairs in an array. You can create a left-growing tree from that by simply iterating over the array in a for-loop.

If your want to learn how to deal with left recursion, you can start from https://en.wikipedia/wiki/Parsing_expression_grammar or more precisely https://en.wikipedia/wiki/Parsing_expression_grammar#Indirect_left_recursion

And then read more about PEG online. But basically a standard way is to use cycles:

Expr    ← Sum
Sum     ← Product (('+' / '-') Product)*
Product ← Value (('*' / '/') Value)*
Value   ← [0-9]+ / '(' Expr ')'

Your can find examples of this grammar everywhere.

If your want to stick to a more pleasant left-recursive grammars, you can read about packrat parser. And find another parser for yourself. Because, I'm no sure, but it looks like parsimmon is not one of them.

If you just what a working code, than you can go to https://repl.it/repls/ObviousNavyblueFiletype

I've implemented the above grammar using parsimmon API

Expr        : r => r.AdditiveExpr,
AdditiveExpr: r => P.seqMap(
  r.MultExpr, P.seq(r.plus.or(r.minus), r.MultExpr).many(),
  left_association
),
MultExpr    : r => P.seqMap(
  r.UnaryExpr, P.seq(r.multiply.or(r.divide), r.UnaryExpr).many(),
  left_association
),
UnaryExpr   : r => P.seq(r.minus, r.UnaryExpr).or(r.PrimaryExpr),
PrimaryExpr : r => P.seqMap(
  r.LPAREN, r.Expr, r.RPAREN, // without parens it won't work
  (lp,ex,rp) => ex // to remove "(" and ")" from the resulting AST
).or(P.digits),

plus        : () => P.string('+').thru(p => p.skip(P.optWhitespace)),
minus       : () => P.string('-').thru(p => p.skip(P.optWhitespace)),
multiply    : () => P.string('*').thru(p => p.skip(P.optWhitespace)),
divide      : () => P.string('/').thru(p => p.skip(P.optWhitespace)),

We also need to use parentheses, or else why would you need recursion? Value ← [0-9]+ will suffice. Without parentheses, there is no need to reference Expr inside grammar. And if we do, it would not consume any input, it would have no sense whatsoever, and it would hang in infinite recursion.

So let's also add:

LPAREN      : () => P.string('(').thru(p => p.skip(P.optWhitespace)),
RPAREN      : () => P.string(')').thru(p => p.skip(P.optWhitespace)),

I also added unary expressions, just for pleteness.

And now for the most interesting part - production functions. Without them the result for, let' say:

(3+4+6+(-7*5))

will look like this:

[[["(",[["3",[]],[["+",["4",[]]],["+",["6",[]]],["+",[["(",[[["-","7"],[["*","5"]]],[]],")"],[]]]]],")"],[]],[]]

With them, it'll be:

[[[["3","+","4"],"+","6"],"+",[["-","7"],"*","5"]]]

Much nicer.

So for left associative operators we will need this:

// input (expr1, [[op1, expr2],[op2, expr3],[op3, expr4]])
// -->
// output [[[expr1, op1, expr2], op2, expr3], op3, expr4]
const left_association  = (ex, rest) => {
  // console.log("input", ex, JSON.stringify(rest))
  if( rest.length === 0 ) return ex
  if( rest.length === 1 ) return [ex, rest[0][0], rest[0][1]]
  let node = [ex]
  rest.forEach(([op, ex]) => {
    node[1] = op;
    node[2] = ex;
    node = [node]
  })
  // console.log("output", JSON.stringify(node))
  return node
}

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744207324a4563166.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信