[antlr-interest] Lexer bug?
Austin Hastings
Austin_Hastings at Yahoo.com
Sun Oct 21 05:09:01 PDT 2007
I'm guessing it's because "10..20" is two completely valid tokens: 10.
and .20. Both NUMBERs, of course.
Keep in mind that you have two different machines at work. Some recent
posts have seems to ignore or forget that, but the lexer should be
backtracking when needed in order to partition the input into valid
lexemes. In this case, it turns out to be simple. The greedy rule says
that 10. is the longest matching token. Then .20 is another valid token,
and lexing is complete.
The parser will gag on this, but there is not backtracking between
layers - the parser isn't telling the lexer to retry with a different
interpretation. The parser just horks up an error about unexpected token
or no alternative and that's that.
In theory, you should be able to describe integer literals, the range
operator, and floats, in that order, and have it work. But I'd try it to
be sure - I'm a little nervous about how the generated lexers handle
nearly-ambiguous input. Something like:
INTEGER_LITERAL: DIGIT+ ;
RANGE_OP: '..' ;
FLOAT_LITERAL: /* as previously */ ;
=Austin
Clifford Heath wrote:
> The following will parse "10 .. 20" and "10 ..20", but not "10..20".
> It looks to me very much like it should... what's up here? It gets
> a "no viable alternate" exception.
>
> Obviously if I changed FRACTION to use DIGIT* instead of DIGIT+, it's
> not going to work in the third case above, but... without that change?
>
> Clifford Heath.
> ----
>
> grammar range;
>
> range
> : NUMBER '..' NUMBER
> ;
>
> NUMBER
> : SIGN? DIGIT+ FRACTION? EXPONENT?
> | SIGN? FRACTION EXPONENT?
> ;
>
> fragment SIGN: ('+' | '-');
> fragment FRACTION: '.' DIGIT+;
> fragment EXPONENT: ('e'|'E') SIGN? DIGIT+;
> fragment DIGIT : '0'..'9';
>
> WS: (' '|'\t'|'\r'|'\n')+ {skip();};
>
>
>
More information about the antlr-interest
mailing list