[antlr-interest] mismatched character

Markus Stoeger spamhole at gmx.at
Thu Jan 8 11:17:43 PST 2009


Oh how many times have I run into this problem... no it's not obvious...

The lexer always tries to return the token that matches the most 
characters *even* if the input characters don't match the whole token.

In your grammar the problem starts at " As". What you'd like to get is a 
WS followed by an ALPHA. But what the lexer tries to give you instead is 
an incomplete ANTE because the WS matches only one character, while ANTE 
matches two characters. So ANTE wins and WS gets discarded. Stupid but 
performant lexer. It throws the exception as the next character required 
to match ANTE isn't 'n' but 's' and since it has already forgotten about 
the correct but discarded WS token.

I see you have already fixed the problem in the meantime. I would also 
have removed the spaces from the ANTE token. I also wonder if you even 
really need the spaces at all. Couldn't you just discard them so they 
don't end up in the parser rules? It doesn't really matter but would 
simplify your rules. Also I wouldn't create tokens like "INT : DIGIT+ 
COMMA_SP?". Better pull that up into parser rules and put as little into 
lexer rules as possible.

hope that helps,
Markus

ian eyberg schrieb:
> Hi list,
>   I have a problem with a very simple
> grammar. Whenever I try to uncomment the
> lexer rule, ANTE, to this grammar it spits out a
>
> line 1:18 mismatched character 's' expecting 'n'
>
>
> test file:
> *** TURN *** [Ad As 6d] [Ts]
>
> the grammar:
>
>
> grammar Blah;
> options {language=Java;}
>
> line  : caction .* NEWLINE
>         { System.out.println("YO"); } ;
>
> caction :  TURN cards '] [' ca=cards ']'
>           { System.out.println($ca.text); } ;
>
> cards : ((ALPHA | INT) WS?)+ ;
>
> fragment LOWER_LETTER   : 'a'..'z' ;
> fragment UPPER_LETTER   : 'A'..'Z' ;
> ALPHA : (LOWER_LETTER | UPPER_LETTER)+ ;
>
> COMMA_SP: ',' ' ' ;
>
> fragment DIGIT  : '0'..'9' ;
> INT : DIGIT+ COMMA_SP?;
>
> NEWLINE : '\r'? '\n' ;
>
> WS      :   ' '+ ;
> COLON_SPACE : ': ' ;
>
> TURN      : '*** TURN *** [' ;
> //ANTE  : ' Ante ' ;
>
>
>
> Can anyone spot the obvious that I'm missing?
>
> Thanks,
> Ian
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
>   



More information about the antlr-interest mailing list