[antlr-interest] Mismatched token problem
Richard Wallace
rwallace at thewallacepack.net
Wed Jan 14 08:00:17 PST 2009
On Tue, Jan 13, 2009 at 8:59 PM, Kevin J. Cummings
<cummings at kjchome.homeip.net> wrote:
> Richard Wallace wrote:
>>
>> Hello,
>>
>> I am trying to write a rule to match expressions in the following
>> algebraic form
>>
>> an+b
>>
>> But, when the b term is negative it is only allowed to be written as
>>
>> an-b
>>
>> It seems easy enough, the problem is that identifiers can have the '-'
>> character in them. So I have the following in my grammar
>>
>> expr
>> : DASH? NUMBER? 'n' S* ( PLUS | DASH ) S* NUMBER
>> ;
>>
>> DASH
>> : '-'
>> ;
>>
>> PLUS
>> : '+'
>> ;
>>
>> IDENT
>> : ('_' | 'a'..'z'| 'A'..'Z' | '\u0100'..'\ufffe' )
>> ('_' | DASH | 'a'..'z'| 'A'..'Z' | '\u0100'..'\ufffe' |
>> '0'..'9')*
>> | DASH ('_' | 'a'..'z'| 'A'..'Z' | '\u0100'..'\ufffe' )
>> ('_' | DASH | 'a'..'z'| 'A'..'Z' | '\u0100'..'\ufffe' |
>> '0'..'9')*
>> ;
>>
>> NUMBER
>> : '-' (('0'..'9')* '.')? ('0'..'9')+
>> | (('0'..'9')* '.')? ('0'..'9')+
>> ;
>> S
>> : ( ' ' | '\t' | '\r' | '\n' | '\f' )
>> ;
>>
>> So, when I try this grammar against 4n+3 it works great. But, if I
>> try it against 4n-1 it fails with a MismatchedTokenException. This
>> seems to be because when evaluating 4n-1 antlr matches the expression
>> as NUMBER IDENT instead of NUMBER 'n' DASH NUMBER. I've tried
>> changing the lookahead and using backtracking all to no avail. I'm
>> out of ideas on how to make antlr stop seeing the n-1 as an IDENT and
>> instead see it as 'n' DASH NUMBER. Any suggestions?
>
> Take the '-' out of the NUMBER production (ie remove the first alternative)
>
> NUMBER : (('0'..'9')* '.')? ('0'..'9')+
> ;
>
Ah good point. I had forgotten that was there. Thanks.
> Why is '-' a valid IDENT character? And are you using IDENT anywhere else
> in your grammar? I don't see it referenced in the snippet above.
> If you need to use '-' in IDENT names, you may need to use a predicate so it
> doesn't get confused with the usage in the expr. Where can IDENTs be used?
> By default antlr will try and match as much as TOKENs as it can. This
> happens long before it starts parsing. IDENT is a Lexer rule (ie made up of
> characters) whereas expr is a Parser rule (made up of tokens).
>
I can't really say why '-' is a valid IDENT character. I wish it
weren't but it is and I am powerless to change it. IDENT is used in
quite a few places, I just sent in a shorter more distilled version of
the grammar as an example of the problem. A few rules where the IDENT
is used is
type : IDENT ;
id : '#' IDENT ;
class : '.' IDENT ;
I've been reading up on predicates trying to understand how to apply
them in this case and I don't fully grasp how to apply it here. I
thought that maybe doing something like the Lexer Lookahead example on
the page <http://www.antlr.org/wiki/display/~gbrose85/7.++Common+Rules+and+Examples>
might do it, but that would also mean that if 'n' was used as an
identifier elsewhere it wouldn't get parsed as an IDENT as it should.
I don't normally ask for this much hand-holding but I'm drawing a
blank here. Think you could walk me through what you mean by using a
predicate?
Thanks again,
Rich
>> Thanks,
>> Rich
>
> --
> Kevin J. Cummings
> kjchome at rcn.com
> cummings at kjchome.homeip.net
> cummings at kjc386.framingham.ma.us
> Registered Linux User #1232 (http://counter.li.org)
>
More information about the antlr-interest
mailing list