[antlr-interest] Possible general solution to problem of keywords as identifiers

Fri Sep 5 01:01:10 PDT 2008

At 10:21 5/09/2008, Ron Hunter-Duvar wrote:
 >Conversely, if the lexer always gave preference to returning
 >keywords, then match would be modified to check for keywords
 >that should be treated as identifiers, something like this:
 >
 >    if ( input.LA(1)==ttype || (ttype==IDENTIFIER_TTYPE &&
 >isKeywordType(input.LA(1)))) {
 >
 >where "isKeywordType(input.LA(1)" somehow checks (perhaps with a 

 >hash table) that the input token type is a keyword that could be 

 >interpreted as an identifier. This approach might suffer from
 >the same maintenance issues that the current approaches do.

Rather than doing that, you could just as easily write this parser 
rule:

identifier
     :  { isKeywordType(input.LA(1)) }? => .
     |  IDENTIFIER
     ;

Or simply list all the possible keywords:

identifier
     :  IDENTIFIER
     |  KEYWORD1 | KEYWORD2 | KEYWORD3 | KEYWORD4 | ...
     ;

(I normally go for the second option.  It seems cleaner.)

I don't think this is all that much of a maintenance hassle... 
after all, how often do you add keywords to an existing language?

This approach is also fully compatible with backtracking or 
syntactic predicates, so you can easily resolve the more ambiguous 
cases.