[antlr-interest] Why and how exactly does ANTLR manage to fail on non recursive grammar for finite language?

Wed Aug 12 14:45:02 PDT 2009

Hi everybody,

Please have a look at a toy grammar for  language consisting of 2 
statements :

grammar Ambiguous;

expr
    : expr1
    | expr2
    ;
expr1
    : PREFIX_1 expr2 SUFFIX
    ;
expr2
    : PREFIX_2
    | PREFIX_2 SUFFIX
    ;

PREFIX_1   : 'prefix_1';
PREFIX_2   : 'prefix_2';
SUFFIX     : 'suffix';
WS         : (' ' | '\r' | '\n' | '\t')+ {$channel=HIDDEN;};

Please do not advise how to fix it :) I know that but the question is why
ANTLR considers rule for expr2 ambiguous?  Here is a tool run:

java org.antlr.Tool Ambiguous.g
warning(200): Ambiguous.g:11:5: Decision can match input such as 
"PREFIX_2 {EOF, SUFFIX}" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
error(201): Ambiguous.g:11:5: The following alternatives can never be 
matched: 2

Or with reversed order of alternatives;

java org.antlr.Tool Ambiguous.g
warning(200): Ambiguous.g:11:5: Decision can match input such as 
"PREFIX_2 SUFFIX" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input

The language is :

PREFIX_1 PREFIX_2 SUFFIX
PREFIX_1 PREFIX_2 SUFFIX SUFFIX

and none of the statements is really ambiguous per the grammar above. 
The only kind
of "ambiguity" I see is that second statement can be parsed as the first 
on plus some
extra input after end of program. And the only answer to the question I 
know is that
this is just how it goes with recursive descent parsers - they are OK 
with leaving extra
input after end of valid program in the input channel. I would not be 
surprised if this
is  the real answer but it is still interesting to know when and how 
exactly ANTLR gets
to this. There are real world problems of the kind which are much more 
difficult to
identify and fix than in the toy grammar above, so if anybody can advise 
of the more
detailed mechanics of how this happens with ANTLR, I would be grateful. 
And also -
of a way (other than the hard way:) to verify that this is or is the 
problem in real world
grammars when suspicious ambiguity is reported.

Regards
Nikolay