[antlr-interest] Mismatched token problem
Kevin J. Cummings
cummings at kjchome.homeip.net
Tue Jan 13 19:59:10 PST 2009
Richard Wallace wrote:
> Hello,
>
> I am trying to write a rule to match expressions in the following algebraic form
>
> an+b
>
> But, when the b term is negative it is only allowed to be written as
>
> an-b
>
> It seems easy enough, the problem is that identifiers can have the '-'
> character in them. So I have the following in my grammar
>
> expr
> : DASH? NUMBER? 'n' S* ( PLUS | DASH ) S* NUMBER
> ;
>
> DASH
> : '-'
> ;
>
> PLUS
> : '+'
> ;
>
> IDENT
> : ('_' | 'a'..'z'| 'A'..'Z' | '\u0100'..'\ufffe' )
> ('_' | DASH | 'a'..'z'| 'A'..'Z' | '\u0100'..'\ufffe' |
> '0'..'9')*
> | DASH ('_' | 'a'..'z'| 'A'..'Z' | '\u0100'..'\ufffe' )
> ('_' | DASH | 'a'..'z'| 'A'..'Z' | '\u0100'..'\ufffe' |
> '0'..'9')*
> ;
>
> NUMBER
> : '-' (('0'..'9')* '.')? ('0'..'9')+
> | (('0'..'9')* '.')? ('0'..'9')+
> ;
> S
> : ( ' ' | '\t' | '\r' | '\n' | '\f' )
> ;
>
> So, when I try this grammar against 4n+3 it works great. But, if I
> try it against 4n-1 it fails with a MismatchedTokenException. This
> seems to be because when evaluating 4n-1 antlr matches the expression
> as NUMBER IDENT instead of NUMBER 'n' DASH NUMBER. I've tried
> changing the lookahead and using backtracking all to no avail. I'm
> out of ideas on how to make antlr stop seeing the n-1 as an IDENT and
> instead see it as 'n' DASH NUMBER. Any suggestions?
Take the '-' out of the NUMBER production (ie remove the first alternative)
NUMBER : (('0'..'9')* '.')? ('0'..'9')+
;
Why is '-' a valid IDENT character? And are you using IDENT anywhere
else in your grammar? I don't see it referenced in the snippet above.
If you need to use '-' in IDENT names, you may need to use a predicate
so it doesn't get confused with the usage in the expr. Where can IDENTs
be used? By default antlr will try and match as much as TOKENs as it
can. This happens long before it starts parsing. IDENT is a Lexer rule
(ie made up of characters) whereas expr is a Parser rule (made up of
tokens).
> Thanks,
> Rich
--
Kevin J. Cummings
kjchome at rcn.com
cummings at kjchome.homeip.net
cummings at kjc386.framingham.ma.us
Registered Linux User #1232 (http://counter.li.org)
More information about the antlr-interest
mailing list