[antlr-interest] Mismatched token problem
Kevin J. Cummings
cummings at kjchome.homeip.net
Wed Jan 14 09:06:00 PST 2009
Richard Wallace wrote:
> I can't really say why '-' is a valid IDENT character. I wish it
> weren't but it is and I am powerless to change it. IDENT is used in
> quite a few places, I just sent in a shorter more distilled version of
> the grammar as an example of the problem. A few rules where the IDENT
> is used is
>
> type : IDENT ;
> id : '#' IDENT ;
> class : '.' IDENT ;
The last time I saw a "-" in an IDENT was in COBOL....
> I've been reading up on predicates trying to understand how to apply
> them in this case and I don't fully grasp how to apply it here. I
> thought that maybe doing something like the Lexer Lookahead example on
> the page <http://www.antlr.org/wiki/display/~gbrose85/7.++Common+Rules+and+Examples>
> might do it, but that would also mean that if 'n' was used as an
> identifier elsewhere it wouldn't get parsed as an IDENT as it should.
>
> I don't normally ask for this much hand-holding but I'm drawing a
> blank here. Think you could walk me through what you mean by using a
> predicate?
Predicates are either syntactic or semantic and are documented in the
Antlr reference guide at http://www.antlr2.org/doc (which documents
version 2.7.5). Also there is this really good example of complex
lexing at
http://www.antlr.org/wiki/display/ANTLR3/Lexer+grammar+for+floating+point%2C+dot%2C+range%2C+time+specs
Note, this is an Antlr 3 lexer module, but most of it applies if you
replace the "fragment" stuff with "protected". I used that style in my
PL/I lexer for both the various arithmetic numbers and again for strings.
So, if you do something like:
protected
IDENTFRAGMENT : ('_' | 'a'..'z'| 'A'..'Z' | '\u0100'..'\ufffe' )
;
protected
IDENTNUMFRAGMENT : IDENTFRAGMENT | '0' .. '9'
;
IDENT : IDENTFRAGMENT ( DASH (IDENTNUMFRAGMENT)? )*
;
DASH : '-' ( IDENTFRAGMENT { _ttype = IDENT; } )?
;
You should be able to cover most of your cases of how an IDENT looks.
(No I haven't tried this.)
I also threw in some options(greedy=true;}: stuff to get rid of the
warning messages Antlr spits out:
DASH : '-' ( options(greedy=true;) : IDENTFRAGMENT ..... )?
You may need to play with the protected rules to get things right for
your grammar, but the method is sound.
> Thanks again,
> Rich
--
Kevin J. Cummings
kjchome at rcn.com
cummings at kjchome.homeip.net
cummings at kjc386.framingham.ma.us
Registered Linux User #1232 (http://counter.li.org)
More information about the antlr-interest
mailing list