[antlr-interest] Problem with simple tokens
Gavin Lambert
antlr at mirality.co.nz
Sat Aug 23 03:54:51 PDT 2008
At 21:11 23/08/2008, Markus Stoeger wrote:
>rule1: Foo ('.' | '!');
>
>Foo: 'foo';
>Identifier: 'a'..'z'+ ('.' 'a'..'z'+)*;
>--- CUT ---
>
>When running that in the debugger it matches "foo!" but not
"foo.",
>which causes a MismatchedTokenException.
>
>Why doesn't it match "foo."?
>
>It has something to do with the Identifier token (which contains
>a dot) but I don't understand why.. note that to match as
>Identifier the dot would have to be followed by at least one
>letter, which isn't the case with "foo.". Also the token Foo
>should have precedence over the token Identifier as it is
>defined earlier.
As I explained earlier today, the ANTLR lexer only looks ahead
just as far as it thinks it needs to in order to disambiguate the
alternatives -- and that's not always far enough to get it
"right".
In this case, what's happening is that the input 'foo' could match
either Foo or Identifier; by itself ANTLR will choose Foo, since
it's listed first -- but when given the input 'foo.', this could
either be "Foo '.'" or "Identifier" (admittedly not a complete
Identifier, but it doesn't realise that yet), so it'll pick
Identifier since it consumes more of the input in one go.
You can force ANTLR to use extra lookahead with the slightly more
verbose:
Identifier: 'a'..'z'+ (('.' 'a'..'z') => '.' 'a'..'z'+)*;
More information about the antlr-interest
mailing list