[antlr-interest] mismatched character
Markus Stoeger
spamhole at gmx.at
Thu Jan 8 11:17:43 PST 2009
Oh how many times have I run into this problem... no it's not obvious...
The lexer always tries to return the token that matches the most
characters *even* if the input characters don't match the whole token.
In your grammar the problem starts at " As". What you'd like to get is a
WS followed by an ALPHA. But what the lexer tries to give you instead is
an incomplete ANTE because the WS matches only one character, while ANTE
matches two characters. So ANTE wins and WS gets discarded. Stupid but
performant lexer. It throws the exception as the next character required
to match ANTE isn't 'n' but 's' and since it has already forgotten about
the correct but discarded WS token.
I see you have already fixed the problem in the meantime. I would also
have removed the spaces from the ANTE token. I also wonder if you even
really need the spaces at all. Couldn't you just discard them so they
don't end up in the parser rules? It doesn't really matter but would
simplify your rules. Also I wouldn't create tokens like "INT : DIGIT+
COMMA_SP?". Better pull that up into parser rules and put as little into
lexer rules as possible.
hope that helps,
Markus
ian eyberg schrieb:
> Hi list,
> I have a problem with a very simple
> grammar. Whenever I try to uncomment the
> lexer rule, ANTE, to this grammar it spits out a
>
> line 1:18 mismatched character 's' expecting 'n'
>
>
> test file:
> *** TURN *** [Ad As 6d] [Ts]
>
> the grammar:
>
>
> grammar Blah;
> options {language=Java;}
>
> line : caction .* NEWLINE
> { System.out.println("YO"); } ;
>
> caction : TURN cards '] [' ca=cards ']'
> { System.out.println($ca.text); } ;
>
> cards : ((ALPHA | INT) WS?)+ ;
>
> fragment LOWER_LETTER : 'a'..'z' ;
> fragment UPPER_LETTER : 'A'..'Z' ;
> ALPHA : (LOWER_LETTER | UPPER_LETTER)+ ;
>
> COMMA_SP: ',' ' ' ;
>
> fragment DIGIT : '0'..'9' ;
> INT : DIGIT+ COMMA_SP?;
>
> NEWLINE : '\r'? '\n' ;
>
> WS : ' '+ ;
> COLON_SPACE : ': ' ;
>
> TURN : '*** TURN *** [' ;
> //ANTE : ' Ante ' ;
>
>
>
> Can anyone spot the obvious that I'm missing?
>
> Thanks,
> Ian
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
>
More information about the antlr-interest
mailing list