[antlr-interest] Mismatched token problem

Wed Jan 14 15:51:19 PST 2009

Richard Wallace wrote:

> Ok, I'm feeling really dense right now.  I put the rules in as follows:
> 
> fragment IDENTFRAGMENT
>     : ('_' | 'a'..'z'| 'A'..'Z' | '\u0100'..'\ufffe' )
>     ;
> 
> fragment IDENTNUMFRAGMENT
>     : IDENTFRAGMENT | '0' .. '9'
>     ;
> 
> IDENT
>     : IDENTFRAGMENT ( DASH | IDENTNUMFRAGMENT )*
>     ;
> 
> DASH
>     : '-' ( options{greedy=true;} : IDENTFRAGMENT { $type = IDENT; } )?
>     ;
> 
> And I even understand what it means (I think), but I'm still running
> into the problem that in the expression 4n-1, n-1 is still being
> considered an expression.  I had to change protected to fragment to

Sorry I thought you were using Antlr 2.7.7, that must of been someone 
else I was chatting with, yes, fragment is correct for Antlr 3.x

> get the lexer to not try and match 4 as a IDENTNUMFRAGMENT and the
> IDENT rule to match the language.  But I don't think that should cause
> this not to work, should it?  I must be missing something.  Any ideas?

In your expr rule you specify S* as possible whitespace seperators. 
Also, if you need to match n-1 as an IDENT, then its possible that you 
need do another fragment to catch the 'n' and what follows as an IDENT.

By default, ANTLR does greedy matching of tokens. In other words, it 
tries to match as much as possible based on your rules.  It also 
tokenizes before it parses.  So, if you don't want 4n-1 to be NUMBER 
IDENT, then you need to have a lexer rule to catch something different. 
  Does it help if you try a lexer rule that catches NUMBER 'n' as a 
TOKEN? and then use *that* in your expr rule?

Also, when I code expression parsers that don't care about whitespace, I 
just set whitespace to be ignored in the lexer.  ANTLR will still stop 
lexing tokens when it finds a whitespace.  So, in general, I never 
reference whitespace in the parser.  You need to fix your token stream 
so that the parser does the right thing with what it finds.

Make a lexer rule for:  DASH? NUMBER? 'n'  Or maybe just for NUMBER 'n'

Sorry for being vague, but I hope its helpful.

> Rich

-- 
Kevin J. Cummings
kjchome at rcn.com
cummings at kjchome.homeip.net
cummings at kjc386.framingham.ma.us
Registered Linux User #1232 (http://counter.li.org)