[antlr-interest] Literals and subrules

Fri Feb 26 13:42:01 PST 2010

At 08:18 27/02/2010, Kenneth Domino wrote:
 >stuff:
 >        'a' .. 'z' | 'A' .. 'Z'
 >        ;
 >
 >The answer was buried deep in the book "The Definitive ANTLR
 >Reference", ISBN-10: 0-9787392-5-6, Version: 2010-2-4,
 >page 95 (section Element Sets).  It is only valid for lexers, 
but
 >was used in a parser context.  So, it's illegal input, but a bug 

 >with the tool as well.

ANTLR 3's error detection is a little thin on the ground, 
apparently partly because it was still using ANTLR 2 under the 
covers (to parse input grammars).  Hopefully the next version will 
be better in this regard.

However, one thing that you should be aware of is that even though 
('a'..'z') might be a valid construct at the parser level, it does 
not mean what you think it means.  If used in the lexer, this 
means "all characters between 'a' and 'z', inclusive".  If used at 
the parser level, though, it would mean "all *tokens* between 'a' 
and 'z', inclusive".  The results would be a bit 
unpredictable.  Assuming that you haven't referenced those tokens 
before, most likely it would only match those two tokens and no 
others; it's possible though that it could match a whole pile of 
completely unrelated tokens.  It therefore makes no sense to use 
this construct at the parser level, even if ANTLR did support it.

All of this is an offshoot of a fundamental confusion between 
quoted literals at lexer level (representing a sequence of 
characters) and at parser level (representing a single unnamed 
token).  Despite their apparent convenience, when starting out 
with ANTLR it is usually best to avoid using quoted literals in 
the parser at all; it's much easier to accidentally break 
something or miss possible ambiguity when using them, since they 
effectively create hidden lexer rules of their own.