[antlr-interest] Identifiers with Spaces

Mon Nov 29 14:23:50 PST 2010

Hi John!

On Fri, 2010-11-26 at 21:38 -0500, John B. Brodie wrote:
> i suggest something like this (untested):
> 
> ID : ID_HEAD ( ' '* ID_TAIL )* ;
> fragment ID_HEAD : LETTER ;
> fragment ID_TAIL : LETTER | DIGIT | '_' ;

That is what I tried.  I just reduced it to the bare minimum to
demonstrate the problem.

> > - Why is the lexer for test2 only using a 1 character lookahead?
> 
> because that is all that is needed to disambiguate the situation. recall
> that the lexer operates without any knowledge of parsing context. so, to
> the lexer, (assuming a rule like ID:LETTER(' '|LETTER)*) "a " is clearly
> an ID and not an 'a' followed by ' '.

I tend to disagree: For the input 
  iden tifier =
the first space is a continuation of the ID, but the second space
is just whitespace to be ignored.  To distinguish between the two cases
the lexer would have to look past the spaces.

I know that the LL(*) parser / lexer engine is capable of doing that,
however ANTLR chooses to create a 1 character lookahead.
I have temporarily worked around the problem by manually changing
the lookahead code in the generated code.

Michael

PS: Your explanation of lexer rule priorities was helpful.