[antlr-interest] Mismatched token problem

Kevin J. Cummings cummings at kjchome.homeip.net
Wed Jan 14 09:06:00 PST 2009


Richard Wallace wrote:
> I can't really say why '-' is a valid IDENT character.  I wish it
> weren't but it is and I am powerless to change it.  IDENT is used in
> quite a few places, I just sent in a shorter more distilled version of
> the grammar as an example of the problem.  A few rules where the IDENT
> is used is
> 
> type : IDENT ;
> id : '#' IDENT ;
> class : '.' IDENT ;

The last time I saw a "-" in an IDENT was in COBOL....

> I've been reading up on predicates trying to understand how to apply
> them in this case and I don't fully grasp how to apply it here.  I
> thought that maybe doing something like the Lexer Lookahead example on
> the page <http://www.antlr.org/wiki/display/~gbrose85/7.++Common+Rules+and+Examples>
> might do it, but that would also mean that if 'n' was used as an
> identifier elsewhere it wouldn't get parsed as an IDENT as it should.
> 
> I don't normally ask for this much hand-holding but I'm drawing a
> blank here.  Think you could walk me through what you mean by using a
> predicate?

Predicates are either syntactic or semantic and are documented in the 
Antlr reference guide at http://www.antlr2.org/doc (which documents 
version 2.7.5).  Also there is this really good example of complex 
lexing at 
http://www.antlr.org/wiki/display/ANTLR3/Lexer+grammar+for+floating+point%2C+dot%2C+range%2C+time+specs
Note, this is an Antlr 3 lexer module, but most of it applies if you 
replace the "fragment" stuff with "protected".  I used that style in my 
PL/I lexer for both the various arithmetic numbers and again for strings.

So, if you do something like:

protected
IDENTFRAGMENT : ('_' | 'a'..'z'| 'A'..'Z' | '\u0100'..'\ufffe' )
               ;

protected
IDENTNUMFRAGMENT : IDENTFRAGMENT | '0' .. '9'
                  ;

IDENT : IDENTFRAGMENT ( DASH (IDENTNUMFRAGMENT)? )*
       ;

DASH : '-' ( IDENTFRAGMENT  { _ttype = IDENT; } )?
      ;

You should be able to cover most of your cases of how an IDENT looks. 
(No I haven't tried this.)

I also threw in some options(greedy=true;}: stuff to get rid of the 
warning messages Antlr spits out:

DASH : '-' ( options(greedy=true;) : IDENTFRAGMENT ..... )?

You may need to play with the protected rules to get things right for 
your grammar, but the method is sound.

> Thanks again,
> Rich

-- 
Kevin J. Cummings
kjchome at rcn.com
cummings at kjchome.homeip.net
cummings at kjc386.framingham.ma.us
Registered Linux User #1232 (http://counter.li.org)


More information about the antlr-interest mailing list