[antlr-interest] IDENTifier rule not working for some tokens

Thu Oct 23 00:32:03 PDT 2008

At 11:27 23/10/2008, brainstorm wrote:
 >> options
 >> {
 >>         output = AST;
 >> //      backtrack = true;
 >>
 >> Don't use this unless there is no other readable way.
 >
 >What do you mean by that ? By the way, looks like it's the
 >preferred way for ANTLR if output is not defined:

I think he was referring to the backtrack option.  While it can 
sometimes be useful, it can significantly slow performance of the 
parser, so it's better to avoid it if possible.

 >In fact, I hit a problem when defining those tokens:
 >
 >tokens {
 >(... other tokens defined...)
 >INT = 'INT';
 >}
 >
 >If I just declare "INT" (only LHS), ANTLR complains:
 >
 >warning(105): CL.g:120:14: no lexer rule corresponding to token: 

 >INT
 >
 >I have to keep writing redundant statements like: INT = 'INT'; 
why
 >is that ?

Using INT by itself defines what's called an "imaginary token" -- 
one that cannot match any input by itself, but can be emitted from 
either the lexer or parser via explicit code.

Using INT='INT' defines a real token that matches that literal 
text in the input -- it's exactly identical to defining the 
following rule at the top of your grammar:

INT: 'INT';

So it's not redundant nor a duplication -- one is defining the 
name of the token while the other is defining the text that it 
matches.

If you did want to create an imaginary token for use in the lexer, 
there is however one somewhat annoying quirk where it also 
generates the warning you mentioned above.  You can either choose 
to ignore this warning (which is why it's a warning, not an 
error), or remove it from the tokens section and declare it as a 
rule like this instead:

fragment INT: '0';

The important points here are that it should be a fragment rule 
(since you don't want ANTLR to try to generate it itself, you just 
want to create a token id that you can refer to from other rules), 
and unless you're actually using it within the matching side of 
another lexer rule then its actual contents don't really matter 
(but they can't be empty or you'll get another warning).