[antlr-interest] What's the best way to differentiate identifiers and keywords in the lexer ?

Mon Jan 27 04:50:28 PST 2003

I define all the keywords as imaginary Tokens in the Parser.
I could define them as empty protected rules int the Lexer,
for example( protected FINALLY:;) but ANTLR produces code for
empty rules as well.
Into a separate class (I do this for convinience) I create a Map
where the String value of the keyword is placed as the key and the int 
code given by antlr (in token types) as value;

And then I have the following rule in the Lexer:

IDENTIFIER_TYPES
   :
   id:IDENTIFIER {
       /*
         Since ASP uses ActiveX components
         and an ActiveX can have a method or attribute with
         name equal to some keyword, be sure that the last token
         wasn't DOT
       */
       if (lastToken != DOT) {
           String idText = id.getText().toUpperCase();
           Object intVal = VbsConstants.KEYWORDS.get(idText);
           if (intVal != null) {
               _ttype = ((Integer) intVal).intValue();
           } else {
               _ttype = IDENTIFIER;
           }
       } else {
           _ttype = IDENTIFIER;
       }
       if (_ttype != REM) {
           lastToken = _ttype;
       } else {
           mLINE(false);
           _ttype = Token.SKIP;
       }
   }
   ;

In this way you can perform various actions based on the keyword.

Anthony Brenelière wrote:
> What's the best way to differentiate identifier and keywords in the
> lexer ?
> 
> I have read the following solutions to avoid nondeterminism :
> 
> --------
> 1. using the token list, or use the strings "(keyword)" in the parser,
> for keywords.
> 
> ..but the problem is that i need some rule to assign code to execute,
> for each keyword.
> --------
> 2. using the syntactic predicates in the lexer
> 
> ..but the problem is that i have to send back a TOKEN that is not the
> TOKEN od the keyword itself.
> 
> I would have something like:
> 
> KEY_OR_ID : (KEYWORD1)=> KEYWORD1 | ... | (KEYWORDn)=> KEYWORDn
> ;
> 
> ID : ('a'..'z'|'A'..'Z'|'_')
> ;
> 
> protected KEYWORD1 : "KEYWORD1" { my code 1 } ;
> (...)
> protected KEYWORDn : "KEYWORDn" { my code n } ;
> 
> ..but I could not return the KEYWORDi token to the parser.
> ---------
> 
> 
> Is there another third (good) solution with ANTLR ?
> 
> Cordially,
> Anthony B.
> 
> 
>  
> 
> Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 
> 
> 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: VbsConstants.java
Type: java/*
Size: 2602 bytes
Desc: not available
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20030127/357bb29b/VbsConstants.bin