[antlr-interest] Question about lexer/parser boundaries
Phil Oliver
antlr at olivercomputing.com
Mon Jun 4 14:17:54 PDT 2007
>I think you are confusing the tokens as implied by the language you are
>parsing (as in TOKEN1 TOKEN2 are to be treated as one unit by the parser
>that is parsing the query, with the tokenability of the input language,
>which is not the same thing. You look for the construct above as:
>compound_element : LBRACKET TOKEN1 TOKEN2 RBRACKET ;
Absolutely not. I understand the difference and that it is not what
their grammar means. I refer you to this:
http://www.w3.org/TR/xquery-xpath-parsing/
in section A.1 EBNF:
...
The following grammars use the same simple Extended Backus-Naur Form
(EBNF) notation as [XML 1.0] with the following minor differences.
The notation "< ... >" is used to indicate a grouping of terminals
that together may help disambiguate the individual symbols.
...
A concrete example of one of the rules is:
[5] SimpleForClause ::= <"for" "$">
<http://www.w3.org/TR/xquery-xpath-parsing/#prod-xpath-VarName>VarName
"in"
<http://www.w3.org/TR/xquery-xpath-parsing/#prod-xpath-ExprSingle>ExprSingle
("," "$"
<http://www.w3.org/TR/xquery-xpath-parsing/#prod-xpath-VarName>VarName
"in"
<http://www.w3.org/TR/xquery-xpath-parsing/#prod-xpath-ExprSingle>ExprSingle)*
<"for" "$"> absolutely does not mean literal character left and right
angle brackets, they denote that 'for $' should be treated as a
single unit effectively (and thus I define a lexer rule that matches
occurances of 'for' '$' with a single token.)
Thanks for you assessment that the other examples are fine. Can you
or Terence comment on some definite hard-and-fast technique for
assessing whether a rule should be lexer or parser?
The most frustrating thing is that I have a grammar that the latest
Antlrworks declares as having "no grammar errors" but it blows up
with out of memory anyway, despite having over a gig allocated to the
VM, which experience so far seems to indicate an unidentified grammar
problem that leads to an infinite loop somewhere. (When I copy the
generated code to an Eclipse project and debug it there, it's going
out of memory at this point:
static {
int numStates = DFA13_transitionS.length;
DFA13_transition = new short[numStates][];
for (int i=0; i<numStates; i++) {
DFA13_transition[i] =
DFA.unpackEncodedString(DFA13_transitionS[i]);
}
}
)
It is frustrating that ANTLR/Antlrworks doesn't appear to be flagging
all possible problems, which makes it very hard to debug a complex
grammar translation.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20070604/d533b273/attachment.html
More information about the antlr-interest
mailing list