The lexical grammar of ECMAScript lists the following token classes for lexical analyzer (lexer):
InputElementDiv::
WhiteSpace
LineTerminator
Comment
CommonToken
DivPunctuator
RightBracePunctuator
InputElementRegExp::
WhiteSpace
LineTerminator
Comment
CommonToken
RightBracePunctuator
RegularExpressionLiteral
InputElementRegExpOrTemplateTail::
WhiteSpace
LineTerminator
Comment
CommonToken
RegularExpressionLiteral
TemplateSubstitutionTail
InputElementTemplateTail::
WhiteSpace
LineTerminator
Comment
CommonToken
DivPunctuator
TemplateSubstitutionTail
While I understand the nested classes like WhiteSpace
, LineTerminator
, I don't understand what the top level classes are: InputElementDiv
, InputElementRegExp
, InputElementRegExpOrTemplateTail
and InputElementTemplateTail
. Can anyone please clarify?
The lexical grammar of ECMAScript lists the following token classes for lexical analyzer (lexer):
InputElementDiv::
WhiteSpace
LineTerminator
Comment
CommonToken
DivPunctuator
RightBracePunctuator
InputElementRegExp::
WhiteSpace
LineTerminator
Comment
CommonToken
RightBracePunctuator
RegularExpressionLiteral
InputElementRegExpOrTemplateTail::
WhiteSpace
LineTerminator
Comment
CommonToken
RegularExpressionLiteral
TemplateSubstitutionTail
InputElementTemplateTail::
WhiteSpace
LineTerminator
Comment
CommonToken
DivPunctuator
TemplateSubstitutionTail
While I understand the nested classes like WhiteSpace
, LineTerminator
, I don't understand what the top level classes are: InputElementDiv
, InputElementRegExp
, InputElementRegExpOrTemplateTail
and InputElementTemplateTail
. Can anyone please clarify?
-
Each top level class represents any one of the productions that follows its
::
. Is that what you meant? Does this help? ecma-international/ecma-262/8.0/… – spanky Commented Aug 16, 2017 at 20:28 - Did you even read the note at the spec section you linked? – Bergi Commented Aug 16, 2017 at 21:23
- 4 @Bergi I'm doing a writeup. I think that part is hard to follow if you don't already know what it's saying. – loganfsmyth Commented Aug 16, 2017 at 21:31
1 Answer
Reset to default 14Definitely not obvious, I had my own struggle decoding all this at one point. The important note is in https://www.ecma-international/ecma-262/8.0/index.html#sec-ecmascript-language-lexical-grammar. Specifically:
There are several situations where the identification of lexical input elements is sensitive to the syntactic grammar context that is consuming the input elements. This requires multiple goal symbols for the lexical grammar. The InputElementRegExpOrTemplateTail goal is used in syntactic grammar contexts where a RegularExpressionLiteral, a TemplateMiddle, or a TemplateTail is permitted. The InputElementRegExp goal symbol is used in all syntactic grammar contexts where a RegularExpressionLiteral is permitted but neither a TemplateMiddle, nor a TemplateTail is permitted. The InputElementTemplateTail goal is used in all syntactic grammar contexts where a TemplateMiddle or a TemplateTail is permitted but a RegularExpressionLiteral is not permitted. In all other contexts, InputElementDiv is used as the lexical goal symbol.
with the key part up front:
There are several situations where the identification of lexical input elements is sensitive to the syntactic grammar context
Keep in mind that this is the lexical grammar definition, so all it aims to do is produce a set of tokens.
So let's break that down more. Consider a snippet like this:
/foo/g
With no context given, there are two ways to interpret this:
DivPunctuator IdentifierName DivPunctuator IdentifierName
"/" "foo" "/" "g"
RegularExpressionLiteral
"/foo/g"
From the standpoint of a lexer, it does not have enough information to know which of these to select. This means the lexer needs to have a flag like expectRegex
or something, that toggles the behavior not just based on the current sequence of characters, but also on previously encountered tokens. Something needs to say "expect an operator next" or "expect a regex literal next".
The same is true for the following
}foo${
RightBracePunctuator IdentifierName Punctuator
"}" "foo$" "{"
TemplateMiddle
"}foo${"
A second toggle needs to be used for this case.
So that leaves us with a nice table of the 4 options that you've seen
| expectRegex | expectTemplate | InputElement |
| ----------- | -------------- | -------------------------------- |
| false | false | InputElementDiv |
| false | true | InputElementTemplateTail |
| true | false | InputElementRegExp |
| true | true | InputElementRegExpOrTemplateTail |
And the spec then covers when these flags toggle:
InputElementRegExpOrTemplateTail
: This goal is used in syntactic grammar contexts where a RegularExpressionLiteral, a TemplateMiddle, or a TemplateTail is permitted.InputElementRegExp
: This goal symbol is used in all syntactic grammar contexts where a RegularExpressionLiteral is permitted but neither a TemplateMiddle, nor a TemplateTail is permitted.InputElementTemplateTail
: This goal is used in all syntactic grammar contexts where a TemplateMiddle or a TemplateTail is permitted but a RegularExpressionLiteral is not permitted.InputElementDiv
: This goal is used in all other contexts.
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745044295a4607999.html
评论列表(0条)