[antlr-interest] Why don't parsers support character ranges?
Hannes Schmidt
antlr5 at hannesschmidt.net
Tue Apr 22 19:16:16 PDT 2008
Hi all,
I would like to use character ranges in a parser as illustrated in the
following example (a very reduced version of my real-world grammar):
grammar test1;
foo : before '@' after;
before : 'a'..'z';
after : 'm'..'z';
ANTLR generates a parser that ignores the range as if the grammar were
grammar test2;
foo : before '@' after;
before : ;
after : ;
IOW, the grammar fails to parse the input "a at m". If I break the grammar
up into a lexer and a parser as in
grammar test3;
foo : BEFORE '@' AFTER;
BEFORE : 'a'..'z';
AFTER : 'm'..'z';
the generated code fails to parse "a at m" with a MismatchedTokeException
at the 'm'. This is because ANTLR silently prioritizes BEFORE even
though its set of characters intersects that of AFTER. Swapping BEFORE
and AFTER would generate a parser that fails to recognize "m at m".
So here are my questions:
Why can't I use ranges in parsers?
Why doesn't ANTLR emit a warning when it ignores ranges in grammar rules?
How can I emulate the missing range feature without obfuscating my
grammar too much? Semantic predicates?
Now let me put my tinfoil hat on and theorize a little bit: I think that
the root cause of my confusion is ANTLR's distinction between lexer and
parser. I think this distinction is purely historical and ANTLR might be
better of without it. When writing grammars, I often find myself in
situations where I know that certain lexer rules make sense in a certain
parser context only but that context is not available to the lexer
because the state that defines it is maintained in the parser.
I fondly remember my CS101 classes when we wrote recursive descent
parsers for LL(*) in Opal (a functional language similar to Haskell). We
didn't have to distinguish between lexer and parser and it felt very
liberating. ;-)
More information about the antlr-interest
mailing list