[antlr-interest] Lexer fails
Gavin Lambert
antlr at mirality.co.nz
Thu Jan 26 23:25:29 PST 2012
At 14:27 27/01/2012, Peter Piper wrote:
>I'm sorry that I can only talk about the old stuff (v3) but can
>anyone see how the following lexer token definition:
>
>FLOAT : ('0'..'9')+ ( '.' ('0'..'9')* )? ('E' | 'e') ('-')?
>('0'..'9')+ ;
[...]
>
>There is no 'e' or 'E' in the input, so why does the ANTLR lexer
>think that this is a "better" token to output than the other one
>I want it to go for, namely:
>
>FIXED : ('0'..'9')+ '.' ('0'..'9')* ;
v3 lexers mostly just use single-char lookahead when around
looping constructs, which isn't sufficient to disambiguate these
cases. You need to help it out a bit by providing explicit
lookahead hints. (Reportedly v4 is infinitely better at this, but
I haven't tried it myself yet.)
fragment FLOAT : ('0'..'9')+ ( '.' ('0'..'9')* )? ('E' | 'e')
('-')? ('0'..'9')+;
FIXED : (FLOAT) => FLOAT { $type = FLOAT; }
| ('0'..'9')+ '.' ('0'..'9')*
;
Or left-factor it for more efficiency (and throw an INTEGER in,
since I assume you have one of those too):
fragment FLOAT : ;
fragment FIXED : ;
INTEGER : ('0'..'9')+
( ('.' ('0'..'9')) => '.' ('0'..'9')* { $type = FIXED; }
( ('E'|'e') '-'? ('0'..'9')+ { $type = FLOAT; } )? )?
;
Or just call all of these things NUMBERs and sort it out in the
parser. :)
More information about the antlr-interest
mailing list