[antlr-interest] ENHANCEMENT - Add option to output tokens as symbolic constants (enum)
Austin Hastings
Austin_Hastings at Yahoo.com
Mon Oct 15 18:38:14 PDT 2007
In looking over the gunit source code, I find several lines that look
like the following:
if ( ts.testSuites.get(input).getType()==27 ) {
// ...
}
else if ( ts.testSuites.get(input).getType()==28 ) {
// ...
}
Guess what that does?
One answer, of course, is that it encodes the transient results of the
token generation process in a way that is disconnected from the process,
so that if tokens are regenerated (such as by me, trying to add some
features) the old numbers become meaningless.
Another answer is that it sprinkles magic numbers through the code.
Both answers are considered less than "good" programming practice. I
understand why it was done, and how to fix it. But I suspect this kind
of greasy-elbows access of the innards is pretty common, especially
since Antlr v3 is pretty lax about data hiding.
I propose that a mechanism be added (string template, most likely) to
convert or extend the .tokens mechanism to a formal file, declaring some
sort of symbolic constants. Since the tokens in question can be
arbitrary strings, a java HashMap is probably the right model - with
something similar in C++, and a bsearch-ed array of const-char-* in C.
The "named" tokens - those given explicit rules in the grammar - are
available as constants in the lexer class. But it isn't quite enough.
The ability to use string literals in the grammar generates too many
automatic token names. The "symbolic name" given to the "27" value above
appears in the gUnitLexer.java as:
public static final int T27=27;
which certainly is a symbolic name, but also encodes the transient
results of the token generation process in a way that is disconnected
from the process. What is needed is something like
t27 = gUnitLexer.getToken("'OK'"); // Note single quotes inside doubles
that can look up both immediate strings and rule names.
=Austin
More information about the antlr-interest
mailing list