[antlr-interest] Keywords not context-free
Gavin Lambert
antlr at mirality.co.nz
Tue Oct 16 02:46:11 PDT 2007
At 10:32 16/10/2007, Clifford Heath wrote:
>In a grammar I'm working on, I have a significant subset of the
>language that allows arbitrary-length strings of identifiers,
>where words that are keywords elsewhere (like "is") may be used
>as regular identifiers.
>
>Is there a general way to handle this kind of
context-sensitivity
>in ANTLR (as in, backup and retry if the grammar reports an
>error), or must I explore traditional methods of informing the
lexer?
The general solution is not to do it in the lexer at all, but
rather to do it in the parser instead.
If you've got a particular string of characters, say "foo", that
might be a "foo keyword" or a "foo identifier", then in the lexer
simply recognise it as "some kind of foo" and don't assign any
semantic meaning to it until you get the surrounding context in
the parser.
The simplest way to do this is to make a catch-all identifier
rule, similar to this:
identifier
: IDENTIFIER | FOO | BAR | BAZ
;
Where FOO, BAR, and BAZ are tokens representing those specific
"words", and IDENTIFIER accepts any other sequence of letters
strung together. Consequently the identifier rule will accept any
of these in an identifier context, and you can also refer to the
FOO, BAR, and BAZ tokens as keywords in some other context.
More information about the antlr-interest
mailing list