[antlr-interest] MismatchedTokenException in sample grammar

Sun Oct 5 03:09:47 PDT 2008

At 22:22 5/10/2008, cyrilg31 at free.fr wrote:
 >SLASH : '/';
 >
 >CHAR : ('a'..'z'|'A'..'Z'|'.'|'-');
 >
 >SEPARATOR_WR : SLASH 'WR';
 >SEPARATOR_BR : SLASH 'BR';
 >SEPARATOR_YH : SLASH 'YH';
[...]
 >When I try with this message: /WRblabla/BRRFT/WNmj/YHkijg , I 
have
 >a " problem matching token at 1:18 
MismatchedTokenException(78!=82)"
 >and I don't recover all the information

The problem is the way that ambiguous tokens like these are 
resolved.  Given the rules defined above, ANTLR sees a leading '/' 
and says "ok, if the next character is a W then it's a 
SEPARATOR_WR, if it's a B then it's a SEPARATOR_BR, if it's a Y 
then it's a SEPARATOR_YH, if it's anything else then it's just a 
SLASH".  It's not until it gets "inside" the SEPARATOR_WR rule 
that it looks at the character following that, discovers that it's 
not an R, and throws an error.

(Essentially, ANTLR always statically chooses the least amount of 
lookahead it can get away with to disambiguate between each 
possible single token; it doesn't consider sequences of tokens or 
possible mismatches "later on" within the token.)

You can resolve this sort of problem by merging the rules together 
and using predicates to force ANTLR to look ahead further (and it 
also helps to extract common prefixes):

fragment SEPARATOR_WR: 'WR';
fragment SEPARATOR_BR: 'BR';
fragment SEPARATOR_YH: 'YH';

SLASH
   : '/'
     ( (SEPARATOR_WR) => SEPARATOR_WR { $type = SEPARATOR_WR; }
     | (SEPARATOR_BR) => SEPARATOR_BR { $type = SEPARATOR_BR; }
     | (SEPARATOR_YH) => SEPARATOR_YH { $type = SEPARATOR_YH; }
     )?
   ;