[antlr-interest] RFE: bring back (at least the global) testLiterals option!

Fri Dec 28 14:14:32 PST 2007

Hi,

I'm thinking of using the "implicit strategy" for keyword lexing
described in
http://www.antlr.org/wiki/pages/viewpage.action?pageId=1802308
because it closely matches how the reference implementation of
this language does it, and it was very easy in ANTLR 2.7.7 using

tokens {
   KWD1 = 'kwd1';
   KWD2 = 'kwd2';
   ...
   KWD252 = 'kwd252';
}

with global options { testLiterals=false; caseSensitiveLiterals=false; }

and rule options { testLiterals=true; } on the rule for identifiers.
The result was clear, correct, and compact.

To do the same thing in ANTLR 3, I have to supply my own
CheckKeywordsTable() that implements the intended matching rules,
and use it explicitly in the identifier rule.  That's no big deal.

What's unpleasant is that ANTLR hasn't preserved my nice, clear
tokens { } block in any form I can use for the purpose. An
array of the keyword strings would offer a way for Java code
to recover and use them. Instead, they are spread out one each in
a pile of mKWD1() ... mKWD252() methods, which aren't even
useful because they call match() on the kwd strings directly
with no chance to apply custom matching rules, and an mTokens()
method is created with a very large predictor DFA that's equally
useless for the same reason. (I suppose I might recover the
strings out of the mKWD methods using reflection and a subclass
that overrides match() ... but that's the sort of thing one only
does to make a point. ;)

I guess in order to avoid creating the useless mKWD...() methods
and huge DFA, I would need a tokens { } block that gives only the
bare KWD... names, and leaves off the = 'kwd...' strings. Then,
to initialize my CheckKeywordsTable, I'll need somewhere to add
252 *more* lines that repeat (without typos) the token names from
the tokens { } block, and associate them with the corresponding
strings. Sure, it's no big deal to generate that with vi commands
from my original 2.7.7 tokens block. But that only confirms that
this is an expansion from 250 to 500 source lines while adding
no new information, but creating a new relationship between
two source tables that now need to be edited in sync. What I like
about ANTLR is that it's a really clear, concise way to describe
a grammar--but this has made it less so.

I can see how the default behavior would be good in default
situations, but for situations like this it might be really
nice to bring back the testLiterals option, just as a global-only
option.  Setting testLiterals=false globally would simply suppress
all the auto-generated mKWD() methods and the big honkin' DFA,
and just generate a nice array of keywords that a custom
CheckKeywordsTable method can be initialized from. That would
allow the grammar to be about as clear and compact as it was before.

There's my suggestion, and I won't even charge the $0.02.

-Chap