[antlr-interest] languages without reserved words

Thu Mar 2 03:03:43 PST 2006

> is there a standard way or a best practice to implement grammars for languages 
> that allow identifiers to be anything, including the keywords of the language 
> itself?

There are several ways, and all of them suck somehow.
      * stateful lexing - create a lexer that keeps track of a lexical
        state and only check for keywords in certain states - pro: you
        always know what your token is - con: pretty complicated and
        messy to implement in ANTLR at the moment (might get better soon
        with island grammars!)
      * disambiguation with predicates - always return tokens as
        keywords, have a big rules called "identifier" that lists NCNAME
        and all your keywords, and then use syntactical predicates in
        the parser to work around the ambiguities. This is a little ugly
        and may be slow.
      * Wait for ANTLR 3 - in ANTLR 3 your problem might get solved
        without the syntactical predicates

Stateful lexing is nice because it keeps the complexity within the
lexer. But it means that you cannot just stop lexing somewhere in the
source file and restart lexing later - also handling lexical errors is a
lot more difficult.

Martin