[antlr-interest] Lexer bug?

Sun Oct 21 05:09:01 PDT 2007

I'm guessing it's because "10..20" is two completely valid tokens: 10. 
and .20. Both NUMBERs, of course.

Keep in mind that you have two different machines at work. Some recent 
posts have seems to ignore or forget that, but the lexer should be 
backtracking when needed in order to partition the input into valid 
lexemes. In this case, it turns out to be simple. The greedy rule says 
that 10. is the longest matching token. Then .20 is another valid token, 
and lexing is complete.

The parser will gag on this, but there is not backtracking between 
layers - the parser isn't telling the lexer to retry with a different 
interpretation. The parser just horks up an error about unexpected token 
or no alternative and that's that.

In theory, you should be able to describe integer literals, the range 
operator, and floats, in that order, and have it work. But I'd try it to 
be sure - I'm a little nervous about how the generated lexers handle 
nearly-ambiguous input. Something like:

INTEGER_LITERAL: DIGIT+ ;

RANGE_OP: '..' ;

FLOAT_LITERAL: /* as previously */ ;

=Austin

Clifford Heath wrote:
> The following will parse "10 .. 20" and "10 ..20", but not "10..20".
> It looks to me very much like it should... what's up here? It gets
> a "no viable alternate" exception.
>
> Obviously if I changed FRACTION to use DIGIT* instead of DIGIT+, it's
> not going to work in the third case above, but... without that change?
>
> Clifford Heath.
> ----
>
> grammar range;
>
> range
> :       NUMBER '..' NUMBER
> ;
>
> NUMBER
> :       SIGN? DIGIT+ FRACTION? EXPONENT?
> |       SIGN? FRACTION EXPONENT?
> ;
>
> fragment SIGN:          ('+' | '-');
> fragment FRACTION:      '.' DIGIT+;
> fragment EXPONENT:      ('e'|'E') SIGN? DIGIT+;
> fragment DIGIT  :       '0'..'9';
>
> WS:     (' '|'\t'|'\r'|'\n')+ {skip();};
>
>
>