[antlr-interest] Weird (to me) grammar problem [solved]
Evan Driscoll
driscoll at cs.wisc.edu
Wed Dec 2 11:03:09 PST 2009
Evan Driscoll wrote:
> I just started work on a grammar to read well, context free grammars,
> and am running into a problem. (I'm probably just doing something dumb.)
> I've attached my grammar.
>
> The ARROW token (used between the left and right sides of a production)
> should recognize either ':' or '->', but the AntlrWorks interpreter only
> accepts '->'. If I try to parse the input 'a -> b;', I get the proper
> result. If I try to parse 'a : b;', it gives a MismatchedTokenException.
> (I am pretty sure I saw the same behavior using the debug option, but I
> don't have the JDK on this computer and can't confirm it.)
>
> The rules in question are:
>
> COLON : ':'; // used in multiple places
>
> ARROW
> : '->'
> | COLON
> ;
>
> production
> : SYMBOL ARROW disjunction SEMICOLON
> ;
Okay, I figured it out.
Since COLON is listed first, the : in the input stream gets lexed as a
COLON token and not ARROW.
My mistake was borne out of a misunderstanding of what the docs mean
about ANTLR lexing using the same LL(*) parsing strategy as the parser
proper. I figured that it would be parsing the 'production' rule, get to
the use of ARROW, then go and call mARROW() in the lexer, and mARROW()
would consume the : and emit a ARROW token.
However, before that point the lookahead framework needs to get a token
stream, and so it calls mTokens(). mTokens sees the : sitting in the
input stream and (correctly) uses the COLON rule.
The fix I put in place was to remove the 'ARROW: COLON' production,
create a new non-terminal 'arrow: COLON | ARROW', and change the use of
'ARROW' to 'arrow'.
Evan
More information about the antlr-interest
mailing list