[antlr-interest] Why and how exactly does ANTLR manage to fail on non recursive grammar for finite language?

Wed Aug 12 15:57:20 PDT 2009

On 08/12/2009 06:28 PM, Nikolay Ognyanov wrote:
> 
> 
> Jim Idle wrote:
>> Nikolay Ognyanov wrote:
>>> Hi everybody,
>>>
>>> Please have a look at a toy grammar for  language consisting of 2 
>>> statements :
>>>
>>> grammar Ambiguous;
>>>
>>> expr
>>>     : expr1
>>>     | expr2
>>>     ;
>>> expr1
>>>     : PREFIX_1 expr2 SUFFIX
>>>     ;
>>> expr2
>>>     : PREFIX_2
>>>     | PREFIX_2 SUFFIX
>>>     ;
>>>
>>> PREFIX_1   : 'prefix_1';
>>> PREFIX_2   : 'prefix_2';
>>> SUFFIX     : 'suffix';
>>> WS         : (' ' | '\r' | '\n' | '\t')+ {$channel=HIDDEN;};
>>>
>>>
>>> Please do not advise how to fix it :) I know that but the question is why
>>> ANTLR considers rule for expr2 ambiguous?  Here is a tool run:
>>>
>>> java org.antlr.Tool Ambiguous.g
>>> warning(200): Ambiguous.g:11:5: Decision can match input such as 
>>> "PREFIX_2 {EOF, SUFFIX}" using multiple alternatives: 1, 2
>>> As a result, alternative(s) 2 were disabled for that input
>>> error(201): Ambiguous.g:11:5: The following alternatives can never be 
>>> matched: 2
>>>   
>> Clearly, this is ambiguous because you follow expr2 with SUFFIX in the
>> expr1 rule. When parsing expr2 as part of expr1, the paths of taking
>> /PREFIX_2 the using SUFFIX in expr1/ and taking PREFIX_2 then SUFFIX
>> in expr2 followed by SUFFIX in expr1 are ambiguous as written.
>>
>> You say you know how to fix it, but perhaps you do not if you cannot
>> see the ambiguity here. Because expr2 is a stand alone path from expr,
>> there is no way to know how to disambiguate when called from expr1.
>> Basically, you need to left factor these rules properly:
>>
>> expr
>>    : PREFIX_1 PREFIX_2 SUFFIX SUFFIX?
>>    | PREFIX_2 SUFFIX?
>>    ;
>>
>>
>>
>> Jim

> By definition a grammar is ambiguous if there is more than 1 way to
> derive at least one statement.
> This is not the case here. There is only 1 way to derive the offending
> statement
> 
> PREFIX_1 PREFIX_2 SUFFIX SUFFIX
> 
> And it would take the procedure for expr2 just about 3 tokens lookahead
> to figure out what is the
> right thing to do. The question is why ANTLR does not do this?

Because you did not set "k=3" in your parser options?
(At least that's what I would do in ANTLR 2.7.7....)

> Regards
> Nikolay

-- 
Kevin J. Cummings
kjchome at rcn.com
cummings at kjchome.homeip.net
cummings at kjc386.framingham.ma.us
Registered Linux User #1232 (http://counter.li.org)