[antlr-interest] MismatchedTokenException

Jim Idle jimi at temporal-wave.com
Wed Dec 16 11:23:40 PST 2009


I think that the problem is you are trying to use the gated predicate to continue consuming. Instead just use action code and then the gated predicate will just select the rule. Here is a working example:

grammar T;

aaa : rule+ EOF
   ;
   
rule
  : classtok
  | ident
  ;
  
classtok : CLASS;
ident : IDENTIFIER;

CLASS
  :
  'class'
  ;


IDENTIFIER
  :
  {Character.isJavaIdentifierStart(input.LA(1))}?=> . { while (Character.isJavaIdentifierPart(input.LA(1))) { input.consume(); } }
  ;
 
 WS : (' '|'\t'|'\n'|'\r')+ { skip(); } ;

As previously stated, your rule here will cause the lexer to just barf on a character that is invalid. So if you construct the set of characters that cannot be anything else in your token set and use that in your while loop then you will be able to check the INDETIFER you pick up and validate it, resulting in a much nicer error message. If you can rely on the input being good, then you perhaps don't need to worry about that.

Jim

> -----Original Message-----
> From: Marcin Rzeznicki [mailto:marcin.rzeznicki at gmail.com]
> Sent: Wednesday, December 16, 2009 10:45 AM
> To: Jim Idle
> Cc: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] MismatchedTokenException
> 
> 2009/12/14 Marcin Rzeźnicki <marcin.rzeznicki at gmail.com>:
> > 2009/12/13 Jim Idle <jimi at temporal-wave.com>:
> >> This usually means that your lexer token numbers are out of sync
> with your
> >> parser tokens. Regen in correct order and make sure all tokens have
> been
> >> declared.
> >>
> >
> > Umm, what if I work with combined grammar? And some of literals are
> 'inlined'?
> >
> 
> I think I know what has been causing this problem but I am scratching
> my head. It seems that ANTLR lexer is, well, a strange beast.
> I have a rule, say
> CLASS
>   :
>   'class'
>   ;
> 
> and below
> 
> IDENTIFIER
>   :
>   {Character.isJavaIdentifierStart(input.LA(1))}?=> . (
> {Character.isJavaIdentifierPart(input.LA(1))}?=> . )*
>   ;
> 
> (the latter rule has been questioned here, but bear with me a while, I
> need it to explain my case)
> 
> Now, upon seeing input 'class' ANTLR matches IDENTIFIER because of
> this gating predicate. Well, 'class' would have been a valid
> identifier, of course but shouldn't it try to match 'class' based on
> rules precedence?
> --
> Greetings
> Marcin Rzeźnicki





More information about the antlr-interest mailing list