[antlr-interest] Why and how exactly does ANTLR manage to fail on non recursive grammar for finite language?

Nikolay Ognyanov nikolay.ognyanov at travelstoremaker.com
Wed Aug 12 15:54:27 PDT 2009



Jim Idle wrote:
> Nikolay Ognyanov wrote:
>>
>>
>> Jim Idle wrote:
>>> Nikolay Ognyanov wrote:
>>>> Hi everybody,
>>>>
>>>> Please have a look at a toy grammar for  language consisting of 2 
>>>> statements :
>>>>
>>>> grammar Ambiguous;
>>>>
>>>> expr
>>>>     : expr1
>>>>     | expr2
>>>>     ;
>>>> expr1
>>>>     : PREFIX_1 expr2 SUFFIX
>>>>     ;
>>>> expr2
>>>>     : PREFIX_2
>>>>     | PREFIX_2 SUFFIX
>>>>     ;
>>>>
>>>> PREFIX_1   : 'prefix_1';
>>>> PREFIX_2   : 'prefix_2';
>>>> SUFFIX     : 'suffix';
>>>> WS         : (' ' | '\r' | '\n' | '\t')+ {$channel=HIDDEN;};
>>>>
>>>>
>>>> Please do not advise how to fix it :) I know that but the question is why
>>>> ANTLR considers rule for expr2 ambiguous?  Here is a tool run:
>>>>
>>>> java org.antlr.Tool Ambiguous.g
>>>> warning(200): Ambiguous.g:11:5: Decision can match input such as 
>>>> "PREFIX_2 {EOF, SUFFIX}" using multiple alternatives: 1, 2
>>>> As a result, alternative(s) 2 were disabled for that input
>>>> error(201): Ambiguous.g:11:5: The following alternatives can never be 
>>>> matched: 2
>>>>   
>>> Clearly, this is ambiguous because you follow expr2 with SUFFIX in 
>>> the expr1 rule. When parsing expr2 as part of expr1, the paths of 
>>> taking /PREFIX_2 the using SUFFIX in expr1/ and taking PREFIX_2 then 
>>> SUFFIX in expr2 followed by SUFFIX in expr1 are ambiguous as written.
>>>
>>> You say you know how to fix it, but perhaps you do not if you cannot 
>>> see the ambiguity here. Because expr2 is a stand alone path from 
>>> expr, there is no way to know how to disambiguate when called from 
>>> expr1. Basically, you need to left factor these rules properly:
>>>
>>> expr
>>>    : PREFIX_1 PREFIX_2 SUFFIX SUFFIX?
>>>    | PREFIX_2 SUFFIX?
>>>    ;
>>>
>>>
>>>
>>> Jim
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>>
>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>>   
>> By definition a grammar is ambiguous if there is more than 1 way to 
>> derive at least one statement.
>> This is not the case here. There is only 1 way to derive the 
>> offending statement
>>
>> PREFIX_1 PREFIX_2 SUFFIX SUFFIX
>>
>> And it would take the procedure for expr2 just about 3 tokens 
>> lookahead to figure out what is the
>> right thing to do.
>> The question is why ANTLR does not do this?
> That isn't the definition of an ambiguous grammar in terms of what 
> ANTLR can anaylzye (at least at the moment.) How do you expect ANTLR 
> to generate disambiguation in  expr2 when called from expr1 but handle 
> being called from expr as well, where as you have the expr rule right 
> now it could be followed by SUFFIX and the parse can just stop. Sure, 
> as humans we can see that in this particular case it could be done if 
> we make assumptions about follow sets, but for the current analysis 
> algorithms, this is ambiguous.
>
> As I said, you need to left factor your rules - your grammar will be 
> much more readable that way too. You could also just conflate expr2 
> into expr1 and the lookahead predictions you think it should do now 
> would actually happen. However, that way your are increasing the 
> complexity.
>
> Consider:
>
> x : P1 P2? P2 ;
>
> vs
>
> x : P1 P2 P2?
>
> as the simplistic case, then when P2 is a series of tokens, and so on.
>
> Jim
> ------------------------------------------------------------------------
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>   
Thank you, Jim. I really do know how to revamp the grammar. At least in 
such toy cases :) What
I am trying to understand better is in what situations ANTLR will fail 
to do otherwise doable by
recursive decent parser things. Your comment about competition between 
parsing of standalone
and embedded exp2 gives me something to think about. I am not really 
sure I can learn much
more than this without diving into ANLR inernals but I am giving it a try :)

Regards
Nikolay







-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090813/e4e3c03c/attachment.html 


More information about the antlr-interest mailing list