[antlr-interest] MismatchedTokenException in sample grammar
Gavin Lambert
antlr at mirality.co.nz
Sun Oct 5 03:09:47 PDT 2008
At 22:22 5/10/2008, cyrilg31 at free.fr wrote:
>SLASH : '/';
>
>CHAR : ('a'..'z'|'A'..'Z'|'.'|'-');
>
>SEPARATOR_WR : SLASH 'WR';
>SEPARATOR_BR : SLASH 'BR';
>SEPARATOR_YH : SLASH 'YH';
[...]
>When I try with this message: /WRblabla/BRRFT/WNmj/YHkijg , I
have
>a " problem matching token at 1:18
MismatchedTokenException(78!=82)"
>and I don't recover all the information
The problem is the way that ambiguous tokens like these are
resolved. Given the rules defined above, ANTLR sees a leading '/'
and says "ok, if the next character is a W then it's a
SEPARATOR_WR, if it's a B then it's a SEPARATOR_BR, if it's a Y
then it's a SEPARATOR_YH, if it's anything else then it's just a
SLASH". It's not until it gets "inside" the SEPARATOR_WR rule
that it looks at the character following that, discovers that it's
not an R, and throws an error.
(Essentially, ANTLR always statically chooses the least amount of
lookahead it can get away with to disambiguate between each
possible single token; it doesn't consider sequences of tokens or
possible mismatches "later on" within the token.)
You can resolve this sort of problem by merging the rules together
and using predicates to force ANTLR to look ahead further (and it
also helps to extract common prefixes):
fragment SEPARATOR_WR: 'WR';
fragment SEPARATOR_BR: 'BR';
fragment SEPARATOR_YH: 'YH';
SLASH
: '/'
( (SEPARATOR_WR) => SEPARATOR_WR { $type = SEPARATOR_WR; }
| (SEPARATOR_BR) => SEPARATOR_BR { $type = SEPARATOR_BR; }
| (SEPARATOR_YH) => SEPARATOR_YH { $type = SEPARATOR_YH; }
)?
;
More information about the antlr-interest
mailing list