[antlr-interest] ENHANCEMENT - Add option to output tokens as symbolic constants (enum)

Austin Hastings Austin_Hastings at Yahoo.com
Mon Oct 15 18:38:14 PDT 2007


In looking over the gunit source code, I find several lines that look 
like the following:

    if ( ts.testSuites.get(input).getType()==27 ) {
       // ...
    }
    else if ( ts.testSuites.get(input).getType()==28 ) {
       // ...
    }

Guess what that does?

One answer, of course, is that it encodes the transient results of the 
token generation process in a way that is disconnected from the process, 
so that if tokens are regenerated (such as by me, trying to add some 
features) the old numbers become meaningless.

Another answer is that it sprinkles magic numbers through the code.

Both answers are considered less than "good" programming practice. I 
understand why it was done, and how to fix it. But I suspect this kind 
of greasy-elbows access of the innards is pretty common, especially 
since Antlr v3 is pretty lax about data hiding.

I propose that a mechanism be added (string template, most likely) to 
convert or extend the .tokens mechanism to a formal file, declaring some 
sort of symbolic constants. Since the tokens in question can be 
arbitrary strings, a java HashMap is probably the right model - with 
something similar in C++, and a bsearch-ed array of const-char-* in C.

The "named" tokens - those given explicit rules in the grammar - are 
available as constants in the lexer class. But it isn't quite enough. 
The ability to use string literals in the grammar generates too many 
automatic token names. The "symbolic name" given to the "27" value above 
appears in the gUnitLexer.java as:

    public static final int T27=27;

which certainly is a symbolic name, but also encodes the transient 
results of the token generation process in a way that is disconnected 
from the process. What is needed is something like

    t27 = gUnitLexer.getToken("'OK'"); // Note single quotes inside doubles

that can look up both immediate strings and rule names.

=Austin



More information about the antlr-interest mailing list