[antlr-interest] What's the best way to differentiate identif iers and keywords in the lexer ?

Mon Jan 27 08:56:20 PST 2003

The GCC toolkit uses a handy approach.  Wrap the rule that tests for
literals in another rule.  That way you get the proper token type (since the
literals table isn't tested until the end of the rule).  Then in the calling
rule do your logic based on the real token type, and use $setType() to
preserve the returned type.

By the way, what exactly do you want to do in the lexer anyhow?  There might
be other solutions if we have more detail.

Monty

-----Original Message-----
From: Anthony Brenelière [mailto:abreneliere at telys.com]
Sent: Monday, January 27, 2003 4:26 AM
To: antlr-interest at yahoogroups.com
Subject: [antlr-interest] What's the best way to differentiate
identifiers and keywords in the lexer ?

What's the best way to differentiate identifier and keywords in the
lexer ?

I have read the following solutions to avoid nondeterminism :

--------
1. using the token list, or use the strings "(keyword)" in the parser,
for keywords.

..but the problem is that i need some rule to assign code to execute,
for each keyword.
--------
2. using the syntactic predicates in the lexer

..but the problem is that i have to send back a TOKEN that is not the
TOKEN od the keyword itself.

I would have something like:

KEY_OR_ID : (KEYWORD1)=> KEYWORD1 | ... | (KEYWORDn)=> KEYWORDn
;

ID : ('a'..'z'|'A'..'Z'|'_')
;

protected KEYWORD1 : "KEYWORD1" { my code 1 } ;
(...)
protected KEYWORDn : "KEYWORDn" { my code n } ;

..but I could not return the KEYWORDi token to the parser.
---------

Is there another third (good) solution with ANTLR ?

Cordially,
Anthony B.

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/