How to implement a basic iterative pushdown automaton parsing algorithm with literal states and transitions in JavaScript? - Sta

admin•2025-04-23 02:10:54•questions•阅读0

We desire to create a pushdown automaton (PDA) that uses the following "alphabet" (by alphabe

We desire to create a pushdown automaton (PDA) that uses the following "alphabet" (by alphabet I mean a unique set of symbol strings/keys):

+aaa
+bbb
+ccc
+ddd
+eee
-aaa
-bbb
-ccc
-ddd
-eee

The "symbols" (3-letter sequences with a + or - prefix) in this alphabet are used to make trees. Here are some example trees:

+eee-eee
+aaa+bbb-bbb-aaa
+bbb+aaa+ccc+eee-eee-ccc+ccc-ccc+ddd+ccc+eee-eee-ccc-ddd-aaa-bbb

Visualized as an actual tree, they would be more like:

+eee
-eee

+aaa
  +bbb
  -bbb
-aaa

+bbb
  +aaa
    +ccc
      +eee
      -eee
    -ccc
    +ccc
    -ccc
    +ddd
      +ccc
        +eee
        -eee
      -ccc
    -ddd
  -aaa
-bbb

So given this alphabet and these example trees, the question is how you would write a generic pushdown automaton to parse these trees. The rules are:

Any letter pair (open/close pair) can have any number of nested children, and it doesn't matter what letter pairs are nested as children.

How would you write a pushdown automaton in JavaScript to parse a string into an AST?

By this I mean, the implementation must literally have a stack, states, and transitions. By this I mean, not implementing an ad-hoc parser, or even a recursive descent parser. This should be an iterative while loop looping through transitions somehow, not using recursion.

The output should be a very simple "AST" that just looks like this (for the +aaa+bbb-bbb-aaa tree):

{
  "type": "aaa",
  "children": [
    {
      "type": "bbb",
      "children": []
    }
  ]
}

I am wondering how to build a PDA so I can, in my specific case working on a custom programming language, convert a rather involved and plex AST I am working and parse it into an object graph. That question is too plicated to write out on SO, and too hard to simplify, which is why I am asking here about a very basic PDA implementation in JavaScript.

I am interested to see how you keep track of the context at each branch point when doing the PDA algorithm and what the transitions look like.

Note: I've heard of / seen "2-state" PDAs occasionally mentioned here and there. It sounds like you have one state to do the putation, and one "final" state. Maybe even just a single state. If it's possible to do it sort of like this, then that would be cool. If not, don't worry about it.

If necessary, you can first write a Context Free Grammar and use that to dynamically construct a PDA, but the key question here is how the PDA is technically implemented. I think it's probably possible / straightforward to just write each transition function by hand, but I'm not entirely sure. Either way works for me.

We desire to create a pushdown automaton (PDA) that uses the following "alphabet" (by alphabet I mean a unique set of symbol strings/keys):

+aaa
+bbb
+ccc
+ddd
+eee
-aaa
-bbb
-ccc
-ddd
-eee

The "symbols" (3-letter sequences with a + or - prefix) in this alphabet are used to make trees. Here are some example trees:

+eee-eee
+aaa+bbb-bbb-aaa
+bbb+aaa+ccc+eee-eee-ccc+ccc-ccc+ddd+ccc+eee-eee-ccc-ddd-aaa-bbb

Visualized as an actual tree, they would be more like:

+eee
-eee

+aaa
  +bbb
  -bbb
-aaa

+bbb
  +aaa
    +ccc
      +eee
      -eee
    -ccc
    +ccc
    -ccc
    +ddd
      +ccc
        +eee
        -eee
      -ccc
    -ddd
  -aaa
-bbb

So given this alphabet and these example trees, the question is how you would write a generic pushdown automaton to parse these trees. The rules are:

Any letter pair (open/close pair) can have any number of nested children, and it doesn't matter what letter pairs are nested as children.

How would you write a pushdown automaton in JavaScript to parse a string into an AST?

The output should be a very simple "AST" that just looks like this (for the +aaa+bbb-bbb-aaa tree):

{
  "type": "aaa",
  "children": [
    {
      "type": "bbb",
      "children": []
    }
  ]
}

I am interested to see how you keep track of the context at each branch point when doing the PDA algorithm and what the transitions look like.

Share Improve this question edited May 28, 2021 at 18:40 asked May 28, 2021 at 3:23 Lance Pollard 79.7k98 gold badges333 silver badges611 bronze badges

Are you after the JS code for a general deterministic PDA? Or are you after the PDA definition for your grammar? Or both? The former is likely readily available (it's just a transition map and a stack). The latter is just fancier parenthesis matching, also readily available. What specifically are you after and what have you tried and where are you stuck? – Welbog Commented May 28, 2021 at 10:40
1 "How would you write a pushdown automaton in JavaScript to parse a string into an AST?" - why are you trying to write a PDA? Why not have a look at parsers instead? You'd need to change the theoretical formalism of the PDA if you want to output something. – Bergi Commented May 28, 2021 at 12:43
2 So basically you're trying to reinvent XSLT? Or a parser generator? I don't see how this question of "How to parse a (tagged) parenthesis tree?" does help with that. – Bergi Commented May 28, 2021 at 13:00
1 It sounds like you're basically trying to design a piler. This is a huge undertaking. Look into high-level articles on piler design, specifically the main steps: tokenization, parsing and creating an execution tree. – Welbog Commented May 28, 2021 at 13:03
1 You may find Reg Braithwaite's related article interesting reading. – Scott Sauyet Commented May 28, 2021 at 19:34

| Show 12 more ments

5 Answers 5

Sorted by: Reset to default 3

It is not possible to create a PDA for producing this tree data structure because:

a PDA has a finite number of states and stack symbols, and no output tape. Yet the number of trees that this PDA should be able to somehow represent (and make available) is infinite. If we consider the nodes of a tree in an object-oriented structure, then there are an infinite number of different nodes possible, as a node's identity is determined also by the references it has. This is in conflict with the finite set of symbols that a PDA has. If as alternative we would not go for an object-oriented output, then we gain nothing: the input already is a serialized representation of the tree.
Even if you would add an output tape to this automaton, making it a rewrite system, you wouldn't gain much, since the input is already representing the tree in a serialized format. If I understood correctly, you are not interested in serialized output (like JSON, which would be trivial as the output order is the same as the input order), but structured output (JavaScript-like objects).

So, as others have also noted, we need to loosen the rules somehow. We can think of several (binations of) ways to do that:

Allow for an infinite number of states, or
Allow for an infinite stack alphabet, or
Allow the PDA to have access to more of the stack than just its top element
Let the transition table refer to a particular attribute of the stack's top element, not the element as a whole -- allowing infinite possibilities for other attributes of the stacked elements, while ensuring that the values of this particular attribute belong to a finite set.
Keep the tree building outside of the PDA
...Any other measures for solving the inpatibility of PDA with the requirement to produce a structured tree.

Some of these options would infer an infinite transition table, and so in that case it cannot be a table, but could be a function.

Implementation choices

1. For the generic engine

Considering your focus on tree-building, and the requirement to have "a stack, states, and transitions", I went for the following implementation choices, and sometimes simplifications:

I take strategy 4 from the above list
The generic engine will itself create the tree. So it is only generic in a limited sense. It can only validate and generate trees.
It will (only) take a transition table as configuration input
It has only two states (boolean), which indicate whether all is OK so far, or not.
The OK-state can only go from true to false when there is no matching transition in the transition table
Once the OK-state is false, the engine will not even attempt to find a transition, but will just ignore any further input.
By consequence the transition table will not include the current state and future state parts, as they are both implied to be true (i.e. "OK").
The tree will be encoded in the stack. The data property of a stack element will be the special attribute that will serve for identifying transitions. The children attribute of each stack element will serve to define the tree and is not considered part of the stack alphabet that the transition table targets.
The stack starts out with one element. Its children attribute will be the output of the engine. At every step this output can be consulted, since this element is never removed from the stack.
The input symbols cannot include the empty string, which is reserved for signalling the end of the input stream. This allows the engine to give feedback whether that is a valid moment to end the input stream.
The engine requires that its stack is "empty" (has just 1 entry) when the end of the input is indicated.
I assumed that input symbols do not contain spaces: the space is used as a delimiter in the transition lookup algorithm
The given transition table is converted to a (hash)map to allow for fast lookup given the current state and the data at the top of the stack. The keys in this map are concatenations of input and stack data values, and it is here that the space was chosen as delimiter. If a space is a valid character in an input symbol, then another encoding mechanism should be selected for serializing this key-pair (e.g. JSON), but I wanted to keep this simple.
The engine does not need to get the set of input symbols, nor the set of stack symbols as configuration input, as these will be implied from the transition table.
The engine's initial state is set to OK (it could be set differently by simple property assignment, but that would be useless as it would then ignore input)

2. For the specific +/- input format

The more specific part of the solution deals with the actual input format you have given in the question. One function will covert the set of types to a transition table, and another will tokenize the input based on the "+" and "-" characters, but without any validation (no symbol check, nor symbol length check, ...), as this will all surface as error anyway when the engine is called.

Any white space in the input is ignored.

Implementation

// To peek the top element of a stack:
Array.prototype.peek = function () { return this[this.length - 1] };

function createParser(transitions) {
    function createNode(data) {
        return { 
            data, 
            children: [] 
        };
    }

    function addChild(parentNode, data) {
        const childNode = createNode(data);
        parentNode.children.push(childNode);
        return childNode;
    }

    let isOK = true; // It's a 2-state (boolean) engine. Transitions implicitly only apply to OK-state
    const stack = [createNode("")]; // Stack is private, and always has Sentinel node with value ""
    // Create a map for the transitions table, for faster lookup
    // We use space as a delimiter for the key pair, assuming an input symbol does not include spaces.
    const transMap = new Map(transitions.map(({whenInput, whenStack, thenPushValue}) =>
        [whenInput + " " + whenStack, thenPushValue]
    ));
    const parser = {
        read(input) { // Returns true when parser can accept more input after this one
            // Once the engine is in an error state, it will not process any further inputs
            if (!isOK) {
                return false;
            }
            // Consider the empty string as the end-of-input marker
            if (input === "") { 
                // Even when state is OK, the stack should now also be "empty"
                isOK &&= stack.length === 1;
                return false; // Not an error condition, but indication that no more input is expected
            }
            // Transitions do not include state requirements, nor new state definitions.
            // It is assumed that a transition can only occur in an OK state, and that all 
            //    included transitions lead to an OK state.
            const pushValue = transMap.get(input + " " + stack.peek().data);
            if (pushValue === undefined) { // No matching transition in the table implies that state is not OK
                isOK = false;
            } else {
                // As this is a two-state engine, with defined transitions only between OK states,
                // each defined transition will affect the stack: so it's either a push or pop.
                // A push is identified by the (non-empy) value to be pushed. An empty string denotes a pop.
                if (pushValue) {
                    stack.push(addChild(stack.peek(), pushValue));
                } else {
                    stack.pop();
                }
            }
            
            return isOK;
        },
        isOK, // Expose the (boolean) state
        output: stack[0].children // Don't expose the stack, but just the forest encoded in it
    };
    return parser;
}

function createTransition(whenInput, whenStack, thenPushValue) {
    return {whenInput, whenStack, thenPushValue}; // First two values imply the third
}

// Functions specific for the input format in the question:

function createTransitionTable(types) {
    // Specific to the input structure (with + and - tags) given in the question
    // An empty string in the second column represents an empty stack
    return [
        // Transitions for opening tags: O(n²)
        ...["", ...types].flatMap(stack => 
            types.map(type => createTransition("+" + type, stack, type))
        ),
        // Transitions for closing tags
        ...types.map(type => createTransition("-" + type, type, ""))
    ];
}

function tokenizer(str) { // Could be a generator, but I went for a function-returning function
    str = str.replace(/\s+/g, ""); // remove white space from input string

    let current = 0; // Maintain state between `getNextToken` function calls
    
    function getNextToken() {
        const start = current++;
        while (current < str.length && str[current] !== "+" && str[current] !== "-") {
            current++;
        }
        const token = str.slice(start, current);
        
        console.log("read", token); // For debugging
        return token;
    }
    return getNextToken;
}

// Example input language, as defined in the question:
const types = ["aaa", "bbb", "ccc", "ddd", "eee"];
const transitionTable = createTransitionTable(types);
const parser = createParser(transitionTable);

// Example input for it:
const rawInput = `
+eee-eee
+aaa+bbb-bbb-aaa
+bbb+aaa+ccc+eee-eee-ccc+ccc-ccc+ddd+ccc+eee-eee-ccc-ddd-aaa-bbb`;
const getNextToken = tokenizer(rawInput);

// Process tokens
while (parser.read(getNextToken())) {}

console.log("Parser state is OK?: ", parser.isOK);
console.log("Parser output:\n");
console.log(JSON.stringify(parser.output, null, 3));

I should note at the outset that I'm no puter scientist and have no real experience writing piler code. So there may be glaring holes in the implementation or even the basic ideas. But if you want the thoughts of a work-a-day programmer who found this an interesting problem, here they are.

We can write a pda function that simply recognizes our grammar, one that we can use like this. (Here we go only from aaa to ccc, but you could easily extend it to eee or whatever.)

const {push: PUSH, pop: POP} = pda

const myParser = pda ('S', ['S'], [
//                     ^     ^
//                     |     `----------------- accepting states
//                     +----------------------- initial state
//   +----------------------------------------- current state
//   |        +-------------------------------- token
//   |        |    +--------------------------- top of stack
//   |        |    |      +-------------------- new state
//   |        |    |      |         +---------- stack action
//   V        V    V      V         V
  [ 'S',   '+aaa', '',   'A',   PUSH ('A') ],
  [ 'S',   '+bbb', '',   'B',   PUSH ('B') ],
  [ 'S',   '+ccc', '',   'C',   PUSH ('C') ],
  [ 'A',   '+aaa', '',   'A',   PUSH ('A') ],
  [ 'A',   '-aaa', 'AA', 'A',   POP        ],
  [ 'A',   '-aaa', 'BA', 'B',   POP        ],
  [ 'A',   '-aaa', 'CA', 'C',   POP        ],
  [ 'A',   '-aaa', '',   'S',   POP        ],
  [ 'A',   '+bbb', '',   'B',   PUSH ('B') ],
  [ 'A',   '+ccc', '',   'C',   PUSH ('C') ],
  [ 'B',   '+aaa', '',   'A',   PUSH ('A') ],
  [ 'B',   '+bbb', '',   'B',   PUSH ('B') ],
  [ 'B',   '-bbb', 'AB', 'A',   POP        ],
  [ 'B',   '-bbb', 'BB', 'B',   POP        ],
  [ 'B',   '-bbb', 'CB', 'C',   POP        ],
  [ 'B',   '-bbb', '',   'S',   POP        ],
  [ 'B',   '+ccc', '',   'C',   PUSH ('C') ],
  [ 'C',   '+aaa', '',   'A',   PUSH ('A') ],
  [ 'C',   '+bbb', '',   'B',   PUSH ('B') ],
  [ 'C',   '+ccc', '',   'C',   PUSH ('C') ],
  [ 'C',   '-ccc', 'AC', 'A',   POP        ],
  [ 'C',   '-ccc', 'BC', 'B',   POP        ],
  [ 'C',   '-ccc', 'CC', 'C',   POP        ],
  [ 'C',   '-ccc', '',   'S',   POP        ],
])

And we would use it to test a series of tokens, like this:

myParser (['+aaa', '-aaa']) //=> true
myParser (['+aaa', '-bbb']) //=> false
myParser (['+aaa', '+bbb', '+ccc', '-ccc', '+aaa', '-aaa', '-bbb', '-aaa']) //=> true

This is not exactly to the mathematical definition of a PDA. We don't have a symbol to delineate the beginning of the stack, and we test the top two values of the stack, not just the top one. But it's reasonably close.

However, this just reports whether a string of tokens is in the grammar. You want something more than that. You need to use this to build a syntax tree. It's very difficult to see how to do this in the abstract. But it's easy enough to generate a sequence of events from that parsing that you could use. One approach would be just to capture the new node value at every push to the stack and capture every pop from the stack.

With that, we might tie to something like this:

const forestBuilder = () => {  // multiple-rooted, so a forest not a tree
  const top = (xs) => xs [ xs .length - 1 ]
  const forest = {children: []}
  let stack = [forest]
  return {
    push: (name) => {
      const node = {name: name, children: []}
      top (stack) .children .push (node)
      stack.push (node)
     },
    pop: () => stack.pop(),
    end: () => forest.children
  }
}

const {push, pop, end} = forestBuilder ()


push ('aaa')
push ('bbb')
pop ()
push ('ccc')
push ('aaa')
pop()
pop()
pop()
push ('bbb')
push ('aaa')
end()

which would yield something like this:

[
    {
        "name": "aaa",
        "children": [
            {
                "name": "bbb",
                "children": []
            },
            {
                "name": "ccc",
                "children": [
                    {
                        "name": "aaa",
                        "children": []
                    }
                ]
            }
        ]
    },
    {
        "name": "bbb",
        "children": [
            {
                "name": "aaa",
                "children": []
            }
        ]
    }
]

So if we supply our pda function with some event listeners for the pushes and pops (also for pletion and errors), we might be able to build your tree from a series of tokens.

Here is one attempt to do this:

console .clear ()

const pda = (() => {
  const PUSH = Symbol(), POP = Symbol()
  const id = (x) => x
  return Object .assign (
    (start, accepting, transitions) => 
      (tokens = [], onPush = id, onPop = id, onComplete = id, onError = () => false) => {
        let stack = []
        let state = start
        for (let token of tokens) {
          const transition = transitions .find (([st, tk, top]) => 
            st == state && 
            tk == token &&
            (top .length == 0 || stack .slice (-top.length) .join ('') == top)
          )
          if (!transition) {
            return onError (token, stack)
          }
          const [, , , nst, action] = transition
          state = nst
          action (stack)
          if (action [PUSH]) {onPush (token)}
          if (action [POP]) {onPop ()}
        }
      return onComplete (!!accepting .includes (state))
    },{
      push: (token) => Object.assign((stack) => stack .push (token), {[PUSH]: true}),
      pop: Object.assign ((stack) => stack .pop (), {[POP]: true}),
    }
  )
})()

const {push: PUSH, pop: POP} = pda

const myParser = pda ('S', ['S'], [
//                     ^     ^
//                     |     `----------------- accepting states
//                     +----------------------- initial state
//   +----------------------------------------- current state
//   |        +-------------------------------- token
//   |        |    +--------------------------- top of stack
//   |        |    |      +-------------------- new state
//   |        |    |      |         +---------- stack action
//   V        V    V      V         V
  [ 'S',   '+aaa', '',   'A',   PUSH ('A') ],
  [ 'S',   '+bbb', '',   'B',   PUSH ('B') ],
  [ 'S',   '+ccc', '',   'C',   PUSH ('C') ],
  [ 'A',   '+aaa', '',   'A',   PUSH ('A') ],
  [ 'A',   '-aaa', 'AA', 'A',   POP        ],
  [ 'A',   '-aaa', 'BA', 'B',   POP        ],
  [ 'A',   '-aaa', 'CA', 'C',   POP        ],
  [ 'A',   '-aaa', '',   'S',   POP        ],
  [ 'A',   '+bbb', '',   'B',   PUSH ('B') ],
  [ 'A',   '+ccc', '',   'C',   PUSH ('C') ],
  [ 'B',   '+aaa', '',   'A',   PUSH ('A') ],
  [ 'B',   '+bbb', '',   'B',   PUSH ('B') ],
  [ 'B',   '-bbb', 'AB', 'A',   POP        ],
  [ 'B',   '-bbb', 'BB', 'B',   POP        ],
  [ 'B',   '-bbb', 'CB', 'C',   POP        ],
  [ 'B',   '-bbb', '',   'S',   POP        ],
  [ 'B',   '+ccc', '',   'C',   PUSH ('C') ],
  [ 'C',   '+aaa', '',   'A',   PUSH ('A') ],
  [ 'C',   '+bbb', '',   'B',   PUSH ('B') ],
  [ 'C',   '+ccc', '',   'C',   PUSH ('C') ],
  [ 'C',   '-ccc', 'AC', 'A',   POP        ],
  [ 'C',   '-ccc', 'BC', 'B',   POP        ],
  [ 'C',   '-ccc', 'CC', 'C',   POP        ],
  [ 'C',   '-ccc', '',   'S',   POP        ],
])


const forestBuilder = () => {
  const top = (xs) => xs [ xs .length - 1 ]
  const forest = {children: []}
  let stack = [forest]
  return {
    push: (name) => {
      const node = {name: name .slice (1), children: []}
      top (stack) .children .push (node)
      stack.push (node)
     },
    pop: () => stack.pop(),
    end: () => forest.children
  }
}

const {push, pop, end} = forestBuilder ()

console .log (myParser (
  ["+ccc", "-ccc", "+aaa", "+bbb", "-bbb", "-aaa", "+bbb", "+aaa", "+ccc", "+bbb", "-bbb", "-ccc", "+ccc", "-ccc", "+aaa", "+ccc", "+ccc", "-ccc", "-ccc", "-aaa", "-aaa", "-bbb"],
  push, 
  pop, 
  (accepted) => accepted ? end () : 'Error: ill-formed',
  (token, stack) => `Error: token = '${token}', stack = ${JSON.stringify(stack)}`
))

.as-console-wrapper {max-height: 100% !important; top: 0}

There are lots of ways this could go. Perhaps the opening event should contain not only the token but the value pushed on the stack. There might be a good way to generate that table of transitions from a more declarative syntax. We might want a different version of the stack action column, one that takes strings instead of take functions. And so on. But it still might be a decent start.

In pseudocode, a DPDA can be implemented like this:

transitions <- map from (state, char, char) to (state, char[0-2])
stack <- stack initialized with a sentinel value 'Z'
input <- string of tokens to parse
currentState <- initial state

for each inputCharacter in input: 
  stackCharacter <- stack.pop()
  currentState, charsToPush <- transitions[(currentState, inputCharacter, stackCharacter)]
  if currentState is not valid:
    return false
  for charToPush in charsToPush.reverse():
    stack.push(charToPush)

return (currentState in acceptingStates) and (stack contains only 'Z')

A PDA such as parenthesis matching is specified like this:

transitions <- {
  (0, '(', 'Z') -> (0, "(Z"),
  (0, '(', '(') -> (0, "(("),
  (0, ')', 'Z') -> nope,
  (0, ')', '(') -> (0, ""),
}
acceptingStates <- [0]
initialState <- 0

Note that the above is deterministic. General PDAs are nondeterministic and not all context-free languages can be decided by DPDAs. Yours can, and care has to be made for how you specify transitions.

To make it more general (nondeterministic) the transition map needs to map to a list of (state, char[]) tuples instead of just one; each step in the loop needs to consider all matching tuples instead of just one.

Does that help?

For your grammar specifically, your tokens are these "+aaa", "-aaa" things. Your alphabet is finite but very large, so you don't want to have to specify everything in your transition map. So here's a decision you have to make: do you want a pure PDA (fully specified map) or do you want to code a PDA-like thing that isn't quite a PDA to avoid that?

If the latter, you want to add a check in your loop that matches the identity of the + and - tokens. You might as well just write your own code at this point, because creating a generic parser that can handle everything is a ton of work. It's easier to just write a parser for your specific needs.

And this is why people have created libraries like flap.js, because this stuff is plicated.

To elaborate on a ment I made, if you make the transition function arbitrary instead of a map, you can express your language that way.

transition <- arbitrary function taking input (state, token, token) and output (state, token[0-2])
stack <- stack initialized with a sentinel token 'Z'
input <- string of tokens to parse
currentState <- initial state

for each inputToken in input: 
  stackToken <- stack.pop()
  currentState, tokensToPush <- transition(currentState, inputToken, stackToken)
  if currentState is not valid:
    return false
  for tokenToPush in tokensToPush.reverse():
    stack.push(tokenToPush)

return (currentState in acceptingStates) and (stack contains only 'Z')

Define transition like this:

function transition(state, input, stack):
  if (input.type = +) 
    return (0, [input, stack])
  else if (input.id = stack.id)
    return (0, [])
  else
    return nope

Where your tokens have a "type" (+ or -) and an "id" ("aaa", "bbb", etc).

You have to careful with an arbitrary transition function, as the solution is now less constrained, more likely to accidentally not be context-free.

The solution below works in three stages:

The input string is tokenized into sequences and prefixes
The tokenized result, as an array, is converted to an AST via transition rules
The AST is converted to the desired JSON output.

First, defining the tokens, transitions, and tokenization function:

class Token{
    constructor(vals, t_type){
       this.vals = vals;
       this.t_type = t_type
    }
    type_eq(t_type){
       return this.t_type === t_type
    }
    static val_eq(t1, t2){
       //check if two tokens objects t1 and t2 represent a +tag and a -tag
       return t1.t_type === 'o_tag' && t2.t_type === 'c_tag' && t1.vals[1].vals[0] === t2.vals[1].vals[0]
    }
}
var lexer = [[/^\-/, 'minus'], [/^\+/, 'plus'], [/^[a-z]+/, 'label']];
var transitions = [{'pattern':['plus', 'label'], 'result':'o_tag', 't_eq':false}, {'pattern':['minus', 'label'], 'result':'c_tag', 't_eq':false}, {'pattern':['o_tag', 'c_tag'], 'result':'block', 't_eq':true}, {'pattern':['o_tag', 'block', 'c_tag'], 'result':'block', 't_eq':true}, {'pattern':['block', 'block'], 'result':'block', 't_eq':false}]
function* tokenize(s){
    //tokenize an input string `s`
    //@yield Token object
    while (s.length){
       for (var [p, t] of lexer){
          var m = s.match(p)
          if (m){
             yield (new Token([m[0]], t))
             s = s.substring(m[0].length)
             break
          }
       }
    }
}

Next, defining a function that takes in the tokenized string and runs an iterative shift-reduce on a stack to build the AST from the bottom up:

function pattern_match(stack, pattern){
    //takes in the stack from `shift_reduce` and attempts to match `pattern` from the back
    if (pattern.length > stack.length){
       return false
    }
    return Array.from(Array(pattern.length).keys()).every(x => stack[stack.length-1-x].type_eq(pattern[pattern.length - 1-x]))
}
function shift_reduce(tokens){
    //consumes `tokens` until empty and returns the resulting tree if in a valid state
    var stack = []
    while (true){
        //per your ment, the line below displays the contents of the stack at each iteration
        console.log(stack.map(x => x.t_type))
        if (!stack.length){
           //stack is empty, push a token on to it
           stack.push(tokens.shift())
        }
        var f = false;
        for (var {pattern:p, result:r, t_eq:tq} of transitions){
            //try to match patterns from `transitions`
            if (pattern_match(stack, p)){
                var new_vals = p.map(_ => stack.pop()).reverse();
                if (!tq || Token.val_eq(new_vals[0], new_vals[new_vals.length-1])){
                    //match found
                    f = true
                    stack.push((new Token(new_vals, r)))
                    break
                }
                else{
                    while (new_vals.length){
                       stack.push(new_vals.shift())
                    }
                }
            }
        }
        if (!f){
           if (!tokens.length){
              //no more tokens to consume, return root of the token tree.                       
              if (stack.length > 1){ 
                 //stack was not reduced to a single token, thus an invalid state
                 throw new Error('invalid state')
              }
              return stack[0]
            }
            //no match found, push another token from `tokens` onto the stack
            stack.push(tokens.shift())
        }
    }
}

Lastly, a function to convert the AST to JSON:

function* to_json(tree){
   if (tree.vals.every(x => x.t_type === 'block')){
      for (var i of tree.vals){
         yield* to_json(i)
      }
   }
   else{
       yield {'type':tree.vals[0].vals[1].vals[0], ...(tree.vals.length === 2 ? {} : {'children':[...to_json(tree.vals[1])]})}
   }
}

Putting it all together:

function to_tree(s){
   var tokens = [...tokenize(s)] //get the tokenized string
   var tree = shift_reduce(tokens) //build the AST from the tokens
   var json_tree = [...to_json(tree)] //convert AST to JSON
   return json_tree
}
console.log(to_tree('+eee-eee'))
console.log(to_tree('+aaa+bbb-bbb-aaa'))
console.log(to_tree('+bbb+aaa+ccc+eee-eee-ccc+ccc-ccc+ddd+ccc+eee-eee-ccc-ddd-aaa-bbb'))

Output:

[
    {
        "type": "eee"
    }
]

[
    {
        "type": "aaa",
        "children": [
            {
                "type": "bbb"
            }
        ]
    }
]

[
    {
        "type": "bbb",
        "children": [
            {
                "type": "aaa",
                "children": [
                    {
                        "type": "ccc",
                        "children": [
                            {
                                "type": "eee"
                            }
                        ]
                    },
                    {
                        "type": "ccc"
                    },
                    {
                        "type": "ddd",
                        "children": [
                            {
                                "type": "ccc",
                                "children": [
                                    {
                                        "type": "eee"
                                    }
                                ]
                            }
                        ]
                    }
                ]
            }
        ]
    }
]

I agree with @trincot's answer (except for his claim that it isn't a PDA).

I'm not sure about the plicated pattern, but the simple one you have is nearly trivial to build a machine for. It is a DCFG (deterministic context free grammar), i.e. it is the intersection of a regular expression and a Dyck (parenthesis matching) machine. All DCFGs encode a PDA. Thus, my disagreement where he says it isn't a PDA.

It is a regular expression, because your "tokens" (the parenthesis) are not a single character long, so you need to right a regular expression that turns those character sequences into single symbol tokens. +aaa -> one token e.g '(', -aaa -> another token ')', +bbb -> still another '[', ... Note the characters I picked for the tokens aren't arbitrary (although they could be) but instead to help you visualize this as parenthesis matching. Note how they pair up.

Your list of tokens will be finite (your strings, while unbounded are still finite). And, there will be two (or three, types of tokens). There will be left parens (brackets, etc), and right parens, and things that neither (i.e. don't need to match). On a left paren, you push something on the stack. On a right paren, you pop the stack. On a neither, you either ignore the stack or push and pop both--both models work.

The FSM that runs the machine needs one state for each pair. On a push you enter that state and that tells you what kind of token you need to see to pop it. If you see a different popping token, you have an error.

Now, as long as your token types are easily divided into these three types of tokens, the problem is trivial. If your tokens, don't, for example, if you are looking for palindromes without a mid-point token, (i.e. some token can be both a left paren and a right paren and you cannot tell from left context which it is), the problem bees non-deterministic and you will need to implement a GLR type parser that keeps a parse forest of alternatives that are still candidates (and if the input is ambiguous, end up with more than one possible tree).

However, I think if you are trying to parse ASTs, you won't have that issue. You really have a very simplified version of the SLR (most basic LR parsing algorithm) at the parens level. And the conversion of sequences to regular expressions is also likely to be trivial, because they will just be a set of fixed strings.

发布者：admin，转转请注明出处：http://www.yc00.com/questions/1745318650a4622338.html

admin

questions
php - Featured images duplicating and replacing next image
Took over a project, and one of the errors is the featured images (and certain others rendered in a similar way) are dup
admin
31分钟前
00
questions
javascript - Replacement for `require.context` when running Jest - Stack Overflow
I'm trying to run tests for my React application using Jest, but my tests keep on failing due to a
admin
26分钟前
00
questions
javascript - Backbone.js templating example - Stack Overflow
How can I add a text box with label name age and view it in the template using backbone.js?<label>
admin
25分钟前
00
questions
html - Append styles in iframe with Javascript (vue.js) - Stack Overflow
Within the admin area of a website builder, I want the user to visually see the style changes before sa
admin
25分钟前
00
questions
html - How am I going to reproduce Javascript bugs if I don't have a Mac & Safari? - Stack Overflow
Simple question... How am I going to reproduce Javascript bugs if I don't have a Mac & Safari?
admin
24分钟前
10
questions
javascript - How to combine jquery ui draggable sortable stuff with custom scale code? - Stack Overflow
i am trying to bine my draggable sortable stuff with some scale code that works well for me. i just fai
admin
22分钟前
00
questions
javascript - Angular communication between two components - Stack Overflow
I'm building a simple web based application using angular version 6. In my application there is a
admin
20分钟前
00
questions
javascript - How to find out which event listener has fired - Stack Overflow
I have a bunch of input fields with event listeners attached using a for loop and class names. Is there
admin
19分钟前
00
questions
javascript - What's the difference between this.route() and this.resource()? - Stack Overflow
I understand that routes are used to map URLs to templates, but apparently you could either define rout
admin
15分钟前
00
questions
javascript - Submitting form (Python) - Stack Overflow
I have a page full of links like this:<a href="javascript:Window('args')"> te
admin
11分钟前
00
questions
woocommerce image product add a class and data-src attributes
I'm trying to add a class lazyload and the data-src attribute to my product images (woocommerce). I've look ev
admin
10分钟前
00
questions
htaccess file for multiple CMS's (multisite wordpress + wolfcms) on same domain but different subdirectories
Thanks for reading this!The website I am building is used for an annual symposium that is held since 2003. All those web
admin
8分钟前
00
questions
php - Real time page update - Stack Overflow
I need help, This is what am actually trying to do. I have two files. "index.php" and "r
admin
8分钟前
10
questions
scrollview - Why does LazyVStack footer stutter when section height exceeds screen frame in SwiftUI? - Stack Overflow
I am having some issues with sticky footers on PinnedScrollableViews in SwiftUI for iOS. I am working o
admin
7分钟前
10
questions
javascript - Download CSV file from assets folder in IONIC 3 - Stack Overflow
I have one demo-file.csv file and it is in the assetscsv folder, so how can I download it from mobile,
admin
7分钟前
00
questions
javascript - Radio buttons in Template-Driven Forms - Stack Overflow
Every time the page loads the last radio button is always checked. I know it has something to do with I
admin
6分钟前
00
questions
dom - On site load, how do you wrap all images in a link dynamically using Javascript? - Stack Overflow
I have barely any experience working with DOM and using Javascript, and I have a very specific task tha
admin
4分钟前
00
questions
javascript - How to get the FULL url from the API route in nextjs - Stack Overflow
I am wondering if there's a way for me to get the full URL of the current request inside the API r
admin
1分钟前
00
questions
javascript - Best way for a React and HTML Select-component to return String instead of Integer - Stack Overflow
I have a React-class (JSX) that contains following (slightly simplified here) code:var Select = React.c
admin
33秒前
00
questions
javascript - Greasetampermonkey script not working unless page refreshed - Stack Overflow
I frequent a forum that has a horrible way of ignoring users.If you place someone on ignore, it almos
admin
4秒前
00

发表回复

评论列表（0条）

暂无评论

How to implement a basic iterative pushdown automaton parsing algorithm with literal states and transitions in JavaScript? - Sta

5 Answers 5

Implementation choices

1. For the generic engine

2. For the specific +/- input format

Implementation

发表回复

评论列表（0条）

联系我们

400-800-8888

How to implement a basic iterative pushdown automaton parsing algorithm with literal states and transitions in JavaScript? - Sta

5 Answers 5

Implementation choices

1. For the generic engine

2. For the specific +/- input format

Implementation

相关推荐

发表回复

评论列表（0条）

联系我们

400-800-8888