[antlr-interest] Why and how exactly does ANTLR manage to fail on non recursive grammar for finite language?

Nikolay Ognyanov nikolay.ognyanov at travelstoremaker.com
Wed Aug 12 16:08:48 PDT 2009



Kevin J. Cummings wrote:
> On 08/12/2009 06:28 PM, Nikolay Ognyanov wrote:
>   
>> Jim Idle wrote:
>>     
>>> Nikolay Ognyanov wrote:
>>>       
>>>> Hi everybody,
>>>>
>>>> Please have a look at a toy grammar for  language consisting of 2 
>>>> statements :
>>>>
>>>> grammar Ambiguous;
>>>>
>>>> expr
>>>>     : expr1
>>>>     | expr2
>>>>     ;
>>>> expr1
>>>>     : PREFIX_1 expr2 SUFFIX
>>>>     ;
>>>> expr2
>>>>     : PREFIX_2
>>>>     | PREFIX_2 SUFFIX
>>>>     ;
>>>>
>>>> PREFIX_1   : 'prefix_1';
>>>> PREFIX_2   : 'prefix_2';
>>>> SUFFIX     : 'suffix';
>>>> WS         : (' ' | '\r' | '\n' | '\t')+ {$channel=HIDDEN;};
>>>>
>>>>
>>>> Please do not advise how to fix it :) I know that but the question is why
>>>> ANTLR considers rule for expr2 ambiguous?  Here is a tool run:
>>>>
>>>> java org.antlr.Tool Ambiguous.g
>>>> warning(200): Ambiguous.g:11:5: Decision can match input such as 
>>>> "PREFIX_2 {EOF, SUFFIX}" using multiple alternatives: 1, 2
>>>> As a result, alternative(s) 2 were disabled for that input
>>>> error(201): Ambiguous.g:11:5: The following alternatives can never be 
>>>> matched: 2
>>>>   
>>>>         
>>> Clearly, this is ambiguous because you follow expr2 with SUFFIX in the
>>> expr1 rule. When parsing expr2 as part of expr1, the paths of taking
>>> /PREFIX_2 the using SUFFIX in expr1/ and taking PREFIX_2 then SUFFIX
>>> in expr2 followed by SUFFIX in expr1 are ambiguous as written.
>>>
>>> You say you know how to fix it, but perhaps you do not if you cannot
>>> see the ambiguity here. Because expr2 is a stand alone path from expr,
>>> there is no way to know how to disambiguate when called from expr1.
>>> Basically, you need to left factor these rules properly:
>>>
>>> expr
>>>    : PREFIX_1 PREFIX_2 SUFFIX SUFFIX?
>>>    | PREFIX_2 SUFFIX?
>>>    ;
>>>
>>>
>>>
>>> Jim
>>>       
>
>   
>> By definition a grammar is ambiguous if there is more than 1 way to
>> derive at least one statement.
>> This is not the case here. There is only 1 way to derive the offending
>> statement
>>
>> PREFIX_1 PREFIX_2 SUFFIX SUFFIX
>>
>> And it would take the procedure for expr2 just about 3 tokens lookahead
>> to figure out what is the
>> right thing to do. The question is why ANTLR does not do this?
>>     
>
> Because you did not set "k=3" in your parser options?
> (At least that's what I would do in ANTLR 2.7.7....)
>
>   
By default ANTLR 3 looks roughly speaking as far ahead as it takes. 
Setting explicitly
high k does not help either.

Regards
Nikolay
>> Regards
>> Nikolay
>>     
>
>   

-- 

*Nikolay Ognyanov, PhD*
Chief Technology Officer
*TravelStoreMaker.com Inc.* <http://www.travelstoremaker.com/>
Phone: +359 2 933 3832
Fax:     +359 2 983 6475

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090813/0c5e5724/attachment.html 


More information about the antlr-interest mailing list