[antlr-interest] trouble with ids and keywords

Sat Feb 7 05:16:34 PST 2009

Gavin Lambert wrote:
> At 11:07 7/02/2009, Bob Marinier wrote:
> >I'm using antlr 2.7.6 and I have a problem with keywords and
> >identifiers conflicting. Specifically, if I have an identifier
> >that starts with a keyword, then the beginning gets picked up
> >as a keyword, as opposed to the whole thing getting recognized
> >as an identifier. For example, one of my keywords is "new". If
> >the input contains "newX", then this gets tokenized as the
> >"new" keyword and an identifier "X", whereas I want just an
> >identifier "newX". That is, I want the identifier rule to
> >be greedy, and only check the literals table after it's read
> >as much as it can.
>
> One of the classic resolutions to this problem is to avoid matching 
> the keywords in the lexer at all -- match them all just as IDs in the 
> lexer, and then test the text of the ID in the parser to verify 
> whether it's a keyword or not.  (If you're outputting an AST, you can 
> then swap it to a keyword token type, if you want.)
>
The problem is that IDs in my system don't allow dashes, but some 
keywords have dashes in them. So keywords with dashes don't get 
recognized in the parser. But it seems to work if I put all of the 
keywords with dashes in the lexer, and the ones without dashes in the 
parser. Having the keywords list spread out over two locations makes me 
cringe a bit, but maybe this is the cleanest solution?

Thanks,
Bob