[antlr-interest] Why don't parsers support character ranges?

Wed Apr 23 05:21:06 PDT 2008

Hannes Schmidt schrieb:
> Hi all,
> 
> I would like to use character ranges in a parser as illustrated in the 
> following example (a very reduced version of my real-world grammar):
> 
> grammar test1;
> foo : before '@' after;
> before : 'a'..'z';
> after : 'm'..'z';
> 
> ANTLR generates a parser that ignores the range as if the grammar were
> 
> grammar test2;
> foo : before '@' after;
> before : ;
> after : ;
> 
> IOW, the grammar fails to parse the input "a at m". If I break the grammar 
> up into a lexer and a parser as in
> 
> grammar test3;
> foo : BEFORE '@' AFTER;
> BEFORE : 'a'..'z';
> AFTER : 'm'..'z';
> 
> the generated code fails to parse "a at m" with a MismatchedTokeException 
> at the 'm'. This is because ANTLR silently prioritizes BEFORE even 
> though its set of characters intersects that of AFTER. Swapping BEFORE 
> and AFTER would generate a parser that fails to recognize "m at m".

You could alternatively use:

grammar test4;
foo : BEFORE '@' AFTER;
BEFORE : A_TO_L | M_TO_Z;
AFTER : M_TO_Z;
fragment A_TO_L: 'a'..'l';
fragment M_TO_Z: 'm'..'z';

But I suppose it is easier for error messages, if you leave A_TO_L in 
for AFTER and check it in a later stage for correctness.

grammar test5;
foo : ALPHA '@' ALPHA;
ALPHA: 'a'..'z';

Johannes