[antlr-interest] Lexer bug?
shmuel siegel
antlr at shmuelhome.mine.nu
Sun Oct 21 06:52:18 PDT 2007
Clifford Heath wrote:
> The following will parse "10 .. 20" and "10 ..20", but not "10..20".
> It looks to me very much like it should... what's up here? It gets
> a "no viable alternate" exception.
>
> Obviously if I changed FRACTION to use DIGIT* instead of DIGIT+, it's
> not going to work in the third case above, but... without that change?
>
> Clifford Heath.
> ----
>
> grammar range;
>
> range
> : NUMBER '..' NUMBER
> ;
>
> NUMBER
> : SIGN? DIGIT+ FRACTION? EXPONENT?
> | SIGN? FRACTION EXPONENT?
> ;
>
> fragment SIGN: ('+' | '-');
> fragment FRACTION: '.' DIGIT+;
> fragment EXPONENT: ('e'|'E') SIGN? DIGIT+;
> fragment DIGIT : '0'..'9';
>
> WS: (' '|'\t'|'\r'|'\n')+ {skip();};
>
In my experience, this is not a bug. Antlr is behaving exactly as its
author intended (even though it is ignoring the question mark on
FRACTION). There are two features of antlr that seem to be coming into play.
1. Antlr sees the first period and thinks that it can match FRACTION
since that is the only valid alternative (antlr ignores the fact
that match nothing is also a valid alternative).
2. Antlr sees that it can match NUMBER by ignoring one of the
periods, so it does. (Antlr3 advanced error recovery).
The problem with the first item is very similar to the Antlr2 problems
with linear approximate look ahead (I think that is its name). The
second feature is new to Antlr3.
BTW, since you are using skip for WS and not a hidden channel, you gain
nothing by tring to do it all in the lexer. If you change number to a
parser rule and change the fragments to regular tokens, your grammar
will work.
More information about the antlr-interest
mailing list