[antlr-interest] How to control token numbers manually?

David Piepgrass qwertie256 at gmail.com
Mon Jul 2 12:38:16 PDT 2007


> I'm not sure if lexers support vocabulary importing (I know
> parsers do), but if they do then you should be able to do it that
> way -- make a tokens file and import it into the lexer.  Worth a
> try, anyway :)

Ahh, of course, I should have tried that.

And happily, it works! But I found the following caveats.

There is a bug that occurs when ANTLR imports and then exports a
backslash. So if I have

parser grammar FooParser;
options {
       tokenVocab=Foo;
...
lexer grammar Foo;
options {
       tokenVocab=Foo2;

// In Foo2.tokens
'\\'=25
// In the generated Foo.tokens
'\'=25
'\\'=31         // Added by ANTLR

This causes a syntax error when compiling the parser. And I guess
there is another bug in ANTLRWorks because after the syntax error,
ANTLRWorks will keep repeating the same error every time you try to
Generate Code, until you quit and restart the program.

By the way, I found that

'\\\\'=25

Seems to work as a single backslash.

There is another important caveat: ANTLR cannot handle "holes" when
importing tokens into the parser, i.e. unused numbers in the list of
tokens. You must start numbering tokens at 4 and continue up from
there with consecutive integers. The problem is that the token names
array called tokenNames[] in your parser will not have any empty
elements in it, so if your tokens are

APPLE=4
GRAPE=5
LEMON=9
PEAR=10

then your token array will be

public static readonly string[] tokenNames = new string[]
{
       "<invalid>",
       "<EOR>",
       "<DOWN>",
       "<UP>",
       "APPLE",
       "GRAPE",
       "LEMON",
       "PEAR"
};

Therefore, token name lookups will not work correctly.

On the plus side, you do not have to define all tokens in your .tokens
file; ANTLR can add any additional tokens you define in the lexer and
will number them correctly.

P.S. I'm using the C# target; perhaps YMMV for Java etc.


More information about the antlr-interest mailing list