[antlr-interest] trouble with ids and keywords

Fri Feb 6 18:52:30 PST 2009

At 11:07 7/02/2009, Bob Marinier wrote:
 >I'm using antlr 2.7.6 and I have a problem with keywords and
 >identifiers conflicting. Specifically, if I have an identifier
 >that starts with a keyword, then the beginning gets picked up
 >as a keyword, as opposed to the whole thing getting recognized
 >as an identifier. For example, one of my keywords is "new". If
 >the input contains "newX", then this gets tokenized as the
 >"new" keyword and an identifier "X", whereas I want just an
 >identifier "newX". That is, I want the identifier rule to
 >be greedy, and only check the literals table after it's read
 >as much as it can.

One of the classic resolutions to this problem is to avoid 
matching the keywords in the lexer at all -- match them all just 
as IDs in the lexer, and then test the text of the ID in the 
parser to verify whether it's a keyword or not.  (If you're 
outputting an AST, you can then swap it to a keyword token type, 
if you want.)