[antlr-interest] Question about lexer/parser boundaries

Mon Jun 4 14:17:54 PDT 2007

 >I think you are confusing the tokens as implied by the language you are
 >parsing (as in TOKEN1 TOKEN2 are to be treated as one unit by the parser
 >that is parsing the query, with the tokenability of the input language,
 >which is not the same thing. You look for the construct above as:

 >compound_element : LBRACKET TOKEN1 TOKEN2 RBRACKET ;

Absolutely not. I understand the difference and that it is not what 
their grammar means. I refer you to this:

http://www.w3.org/TR/xquery-xpath-parsing/

in section A.1 EBNF:
...
The following grammars use the same simple Extended Backus-Naur Form 
(EBNF) notation as [XML 1.0] with the following minor differences. 
The notation "< ... >" is used to indicate a grouping of terminals 
that together may help disambiguate the individual symbols.
...

A concrete example of one of the rules is:

[5]  SimpleForClause    ::=    <"for" "$"> 
<http://www.w3.org/TR/xquery-xpath-parsing/#prod-xpath-VarName>VarName 
"in" 
<http://www.w3.org/TR/xquery-xpath-parsing/#prod-xpath-ExprSingle>ExprSingle 
("," "$" 
<http://www.w3.org/TR/xquery-xpath-parsing/#prod-xpath-VarName>VarName 
"in" 
<http://www.w3.org/TR/xquery-xpath-parsing/#prod-xpath-ExprSingle>ExprSingle)*

<"for" "$"> absolutely does not mean literal character left and right 
angle brackets, they denote that 'for $' should be treated as a 
single unit effectively (and thus I define a lexer rule that matches 
occurances of 'for' '$' with a single token.)

Thanks for you assessment that the other examples are fine. Can you 
or Terence comment on some definite hard-and-fast technique for 
assessing whether a rule should be lexer or parser?

The most frustrating thing is that I have a grammar that the latest 
Antlrworks declares as having "no grammar errors" but it blows up 
with out of memory anyway, despite having over a gig allocated to the 
VM, which experience so far seems to indicate an unidentified grammar 
problem that leads to an infinite loop somewhere. (When I copy the 
generated code to an Eclipse project and debug it there, it's going 
out of memory at this point:

     static {
         int numStates = DFA13_transitionS.length;
         DFA13_transition = new short[numStates][];
         for (int i=0; i<numStates; i++) {
             DFA13_transition[i] = 
DFA.unpackEncodedString(DFA13_transitionS[i]);
         }
     }
)

It is frustrating that ANTLR/Antlrworks doesn't appear to be flagging 
all possible problems, which makes it very hard to debug a complex 
grammar translation.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20070604/d533b273/attachment.html