[antlr-interest] languages without reserved words
Martin Probst
mail at martin-probst.com
Thu Mar 2 03:03:43 PST 2006
> is there a standard way or a best practice to implement grammars for languages
> that allow identifiers to be anything, including the keywords of the language
> itself?
There are several ways, and all of them suck somehow.
* stateful lexing - create a lexer that keeps track of a lexical
state and only check for keywords in certain states - pro: you
always know what your token is - con: pretty complicated and
messy to implement in ANTLR at the moment (might get better soon
with island grammars!)
* disambiguation with predicates - always return tokens as
keywords, have a big rules called "identifier" that lists NCNAME
and all your keywords, and then use syntactical predicates in
the parser to work around the ambiguities. This is a little ugly
and may be slow.
* Wait for ANTLR 3 - in ANTLR 3 your problem might get solved
without the syntactical predicates
Stateful lexing is nice because it keeps the complexity within the
lexer. But it means that you cannot just stop lexing somewhere in the
source file and restart lexing later - also handling lexical errors is a
lot more difficult.
Martin
More information about the antlr-interest
mailing list