[antlr-interest] V3 lexer behaviour clarifications

Sat Mar 31 15:09:28 PDT 2007

Just trying to get my head around some of the differences between 
lexer and parser (in V3).  Am I correct in assuming that the lexer 
doesn't get any of the cool new LL(*) lookahead and backtracking 
that's available to the parser?

Because logically, if I've got two lexer rules like so:

FLOAT : INT '.' INT;
INT : ('0'..'9')+;

There's obviously ambiguity between them, but I would expect it to 
try matching as a FLOAT first (since I listed it first) and only 
if that fails should it return an INT and then try lexing whatever 
comes after it as a separate token.

Trying a similar grammar to the above (not the exact grammar 
above, though), however, that's not what seems to be 
happening.  It just reports an error and then treats it as an 
INT.  The only way I can get it to do the behaviour I want is to 
make a composite rule with predicates and explicit token-type 
changing code, which seems ugly.

Is this normal for now?  If so, will it be improved in the 
future?  Or am I just doing something stupid?