[antlr-interest] Lexer bug?

shmuel siegel antlr at shmuelhome.mine.nu
Sun Oct 21 06:52:18 PDT 2007


Clifford Heath wrote:
> The following will parse "10 .. 20" and "10 ..20", but not "10..20".
> It looks to me very much like it should... what's up here? It gets
> a "no viable alternate" exception.
>
> Obviously if I changed FRACTION to use DIGIT* instead of DIGIT+, it's
> not going to work in the third case above, but... without that change?
>
> Clifford Heath.
> ----
>
> grammar range;
>
> range
> :       NUMBER '..' NUMBER
> ;
>
> NUMBER
> :       SIGN? DIGIT+ FRACTION? EXPONENT?
> |       SIGN? FRACTION EXPONENT?
> ;
>
> fragment SIGN:          ('+' | '-');
> fragment FRACTION:      '.' DIGIT+;
> fragment EXPONENT:      ('e'|'E') SIGN? DIGIT+;
> fragment DIGIT  :       '0'..'9';
>
> WS:     (' '|'\t'|'\r'|'\n')+ {skip();};
>
In my experience, this is not a bug. Antlr is behaving exactly as its 
author intended (even though it is ignoring the question mark on 
FRACTION). There are two features of antlr that seem to be coming into play.

   1. Antlr  sees the first period and thinks that it can match FRACTION
      since that is the only valid alternative (antlr ignores the fact
      that match nothing is also a valid alternative).
   2. Antlr sees that it can match NUMBER by ignoring one of the
      periods, so it does. (Antlr3 advanced error recovery).

The problem with the first item is very similar to the Antlr2 problems 
with linear approximate look ahead (I think that is its name).  The 
second feature is new to Antlr3.

BTW, since you are using skip for WS and not a hidden channel, you gain 
nothing by tring to do it all in the lexer. If you change number to a 
parser rule and change the fragments to regular tokens, your grammar 
will work.





More information about the antlr-interest mailing list