[antlr-interest] Why and how exactly does ANTLR manage to fail on non recursive grammar for finite language?

Wed Aug 12 15:28:16 PDT 2009

Jim Idle wrote:
> Nikolay Ognyanov wrote:
>> Hi everybody,
>>
>> Please have a look at a toy grammar for  language consisting of 2 
>> statements :
>>
>> grammar Ambiguous;
>>
>> expr
>>     : expr1
>>     | expr2
>>     ;
>> expr1
>>     : PREFIX_1 expr2 SUFFIX
>>     ;
>> expr2
>>     : PREFIX_2
>>     | PREFIX_2 SUFFIX
>>     ;
>>
>> PREFIX_1   : 'prefix_1';
>> PREFIX_2   : 'prefix_2';
>> SUFFIX     : 'suffix';
>> WS         : (' ' | '\r' | '\n' | '\t')+ {$channel=HIDDEN;};
>>
>>
>> Please do not advise how to fix it :) I know that but the question is why
>> ANTLR considers rule for expr2 ambiguous?  Here is a tool run:
>>
>> java org.antlr.Tool Ambiguous.g
>> warning(200): Ambiguous.g:11:5: Decision can match input such as 
>> "PREFIX_2 {EOF, SUFFIX}" using multiple alternatives: 1, 2
>> As a result, alternative(s) 2 were disabled for that input
>> error(201): Ambiguous.g:11:5: The following alternatives can never be 
>> matched: 2
>>   
> Clearly, this is ambiguous because you follow expr2 with SUFFIX in the 
> expr1 rule. When parsing expr2 as part of expr1, the paths of taking 
> /PREFIX_2 the using SUFFIX in expr1/ and taking PREFIX_2 then SUFFIX 
> in expr2 followed by SUFFIX in expr1 are ambiguous as written.
>
> You say you know how to fix it, but perhaps you do not if you cannot 
> see the ambiguity here. Because expr2 is a stand alone path from expr, 
> there is no way to know how to disambiguate when called from expr1. 
> Basically, you need to left factor these rules properly:
>
> expr
>    : PREFIX_1 PREFIX_2 SUFFIX SUFFIX?
>    | PREFIX_2 SUFFIX?
>    ;
>
>
>
> Jim
>
>
> ------------------------------------------------------------------------
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>   
By definition a grammar is ambiguous if there is more than 1 way to 
derive at least one statement.
This is not the case here. There is only 1 way to derive the offending 
statement

PREFIX_1 PREFIX_2 SUFFIX SUFFIX

And it would take the procedure for expr2 just about 3 tokens lookahead 
to figure out what is the
right thing to do. The question is why ANTLR does not do this?

Regards
Nikolay

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090813/c7ca1577/attachment.html