[antlr-interest] Source positions for imaginary tokens

Jim Idle jimi at temporal-wave.com
Fri Sep 14 10:28:05 PDT 2012

That’s one way to do it and it can reduce the size of the generated code a
lot. I have done that when retro-fitting a SQL grammar that was not written
so well for instance. For the record, in C/C++ use gperf or cmph for this
(depending on keyword numbers).

However, if you build the keyword set first, then the keywordsAsID  rule,
you will find that that having to use that rule will guide you to a better
grammar for what should be obvious reasons.

There are some occasions where the language is so mad crazy that you can’t
beat it in to submission without a little lateral thinking – in those
cases, a honey badger may help J


*From:* Jesse McGrew [mailto:jmcgrew at gmail.com]
*Sent:* Friday, September 14, 2012 10:13 AM
*To:* Mike Lischke
*Cc:* antlr-interest; Jim Idle
*Subject:* Re: [antlr-interest] Source positions for imaginary tokens

For what it's worth, I tried to solve the "keywords as identifiers" problem
using what seems to be the recommended solution -- a parser rule that
accepts keywords as well as ID -- and could not get it to work: Antlr would
crash when I tried to generate code, which I assumed was because of the
number of alternatives I was adding to every place an identifier could

Instead, I ended up getting rid of the lexer rules for keywords, so every
keyword is lexed as ID, and overriding the "emit" function to look the text
up in a keyword hash table and set a field in the token. Keywords are
matched with parser rules that use gating semantic predicates to check that


On Sep 12, 2012 11:53 PM, "Mike Lischke" <mike at lischke-online.de> wrote:

Hey Jim,

> Sure – I can make it be either of those calls, but not both at once. I
> no context at code generation time that can tell me which one to generate.

So you say, you don't know at this time what type $kw in the ID[$kw]
expression is? Absolutely no way to determine if that is a string or a
token reference? That's odd.

> If I change it to this, then all the people that want it to be the other
> way, will claim that they have found a bug too. It only works in Java
> because the Java compiler can see what the argument types are, and can
> therefore call the “correct” method.

I understand your restrictions, but find this situation all but pleasant
(and I'm not alone I'm afraid).

> However, it is much simpler to just use code to operate on the token
> directly. Even before that, you should consider whether you need to change
> something about the token because a later stage MUST receive a different
> token, or whether you just think that you WANT it to.

In the keyword case it is just so that I need only one (very common) token
type but want to retain the token text for later processing. It's
unfortunate that there is no general solution for the frequently
encountered keywords-as-identifier problem.

Anyway, Jim, thanks for patience and time!


List: http://www.antlr.org/mailman/listinfo/antlr-interest

More information about the antlr-interest mailing list