[antlr-interest] Literals and subrules
Gavin Lambert
antlr at mirality.co.nz
Fri Feb 26 13:42:01 PST 2010
At 08:18 27/02/2010, Kenneth Domino wrote:
>stuff:
> 'a' .. 'z' | 'A' .. 'Z'
> ;
>
>The answer was buried deep in the book "The Definitive ANTLR
>Reference", ISBN-10: 0-9787392-5-6, Version: 2010-2-4,
>page 95 (section Element Sets). It is only valid for lexers,
but
>was used in a parser context. So, it's illegal input, but a bug
>with the tool as well.
ANTLR 3's error detection is a little thin on the ground,
apparently partly because it was still using ANTLR 2 under the
covers (to parse input grammars). Hopefully the next version will
be better in this regard.
However, one thing that you should be aware of is that even though
('a'..'z') might be a valid construct at the parser level, it does
not mean what you think it means. If used in the lexer, this
means "all characters between 'a' and 'z', inclusive". If used at
the parser level, though, it would mean "all *tokens* between 'a'
and 'z', inclusive". The results would be a bit
unpredictable. Assuming that you haven't referenced those tokens
before, most likely it would only match those two tokens and no
others; it's possible though that it could match a whole pile of
completely unrelated tokens. It therefore makes no sense to use
this construct at the parser level, even if ANTLR did support it.
All of this is an offshoot of a fundamental confusion between
quoted literals at lexer level (representing a sequence of
characters) and at parser level (representing a single unnamed
token). Despite their apparent convenience, when starting out
with ANTLR it is usually best to avoid using quoted literals in
the parser at all; it's much easier to accidentally break
something or miss possible ambiguity when using them, since they
effectively create hidden lexer rules of their own.
More information about the antlr-interest
mailing list