[antlr-interest] Weird (to me) grammar problem [solved]

Evan Driscoll driscoll at cs.wisc.edu
Wed Dec 2 11:03:09 PST 2009


Evan Driscoll wrote:
> I just started work on a grammar to read well, context free grammars,
> and am running into a problem. (I'm probably just doing something dumb.)
> I've attached my grammar.
> 
> The ARROW token (used between the left and right sides of a production)
> should recognize either ':' or '->', but the AntlrWorks interpreter only
> accepts '->'. If I try to parse the input 'a -> b;', I get the proper
> result. If I try to parse 'a : b;', it gives a MismatchedTokenException.
> (I am pretty sure I saw the same behavior using the debug option, but I
> don't have the JDK on this computer and can't confirm it.)
> 
> The rules in question are:
> 
>   COLON : ':'; // used in multiple places
> 
>   ARROW	
>       : '->'
>       | COLON
>       ;
> 
>   production
>       : SYMBOL ARROW disjunction SEMICOLON
>       ;

Okay, I figured it out.

Since COLON is listed first, the : in the input stream gets lexed as a
COLON token and not ARROW.

My mistake was borne out of a misunderstanding of what the docs mean
about ANTLR lexing using the same LL(*) parsing strategy as the parser
proper. I figured that it would be parsing the 'production' rule, get to
the use of ARROW, then go and call mARROW() in the lexer, and mARROW()
would consume the : and emit a ARROW token.

However, before that point the lookahead framework needs to get a token
stream, and so it calls mTokens(). mTokens sees the : sitting in the
input stream and (correctly) uses the COLON rule.


The fix I put in place was to remove the 'ARROW: COLON' production,
create a new non-terminal 'arrow: COLON | ARROW', and change the use of
'ARROW' to 'arrow'.

Evan


More information about the antlr-interest mailing list