[antlr-interest] Keywords not context-free

Tue Oct 16 02:46:11 PDT 2007

At 10:32 16/10/2007, Clifford Heath wrote:
 >In a grammar I'm working on, I have a significant subset of the
 >language that allows arbitrary-length strings of identifiers,
 >where words that are keywords elsewhere (like "is") may be used
 >as regular identifiers.
 >
 >Is there a general way to handle this kind of 
context-sensitivity
 >in ANTLR (as in, backup and retry if the grammar reports an
 >error), or must I explore traditional methods of informing the 
lexer?

The general solution is not to do it in the lexer at all, but 
rather to do it in the parser instead.

If you've got a particular string of characters, say "foo", that 
might be a "foo keyword" or a "foo identifier", then in the lexer 
simply recognise it as "some kind of foo" and don't assign any 
semantic meaning to it until you get the surrounding context in 
the parser.

The simplest way to do this is to make a catch-all identifier 
rule, similar to this:

identifier
   : IDENTIFIER | FOO | BAR | BAZ
   ;

Where FOO, BAR, and BAZ are tokens representing those specific 
"words", and IDENTIFIER accepts any other sequence of letters 
strung together.  Consequently the identifier rule will accept any 
of these in an identifier context, and you can also refer to the 
FOO, BAR, and BAZ tokens as keywords in some other context.