[antlr-interest] Identifiers with Spaces
Michael Bosch
hirbli at nettmail.de
Mon Nov 29 14:23:50 PST 2010
Hi John!
On Fri, 2010-11-26 at 21:38 -0500, John B. Brodie wrote:
> i suggest something like this (untested):
>
> ID : ID_HEAD ( ' '* ID_TAIL )* ;
> fragment ID_HEAD : LETTER ;
> fragment ID_TAIL : LETTER | DIGIT | '_' ;
That is what I tried. I just reduced it to the bare minimum to
demonstrate the problem.
> > - Why is the lexer for test2 only using a 1 character lookahead?
>
> because that is all that is needed to disambiguate the situation. recall
> that the lexer operates without any knowledge of parsing context. so, to
> the lexer, (assuming a rule like ID:LETTER(' '|LETTER)*) "a " is clearly
> an ID and not an 'a' followed by ' '.
I tend to disagree: For the input
iden tifier =
the first space is a continuation of the ID, but the second space
is just whitespace to be ignored. To distinguish between the two cases
the lexer would have to look past the spaces.
I know that the LL(*) parser / lexer engine is capable of doing that,
however ANTLR chooses to create a 1 character lookahead.
I have temporarily worked around the problem by manually changing
the lookahead code in the generated code.
Michael
PS: Your explanation of lexer rule priorities was helpful.
More information about the antlr-interest
mailing list