[antlr-interest] Why and how exactly does ANTLR manage to fail on non recursive grammar for finite language?
Nikolay Ognyanov
nikolay.ognyanov at travelstoremaker.com
Wed Aug 12 15:28:16 PDT 2009
Jim Idle wrote:
> Nikolay Ognyanov wrote:
>> Hi everybody,
>>
>> Please have a look at a toy grammar for language consisting of 2
>> statements :
>>
>> grammar Ambiguous;
>>
>> expr
>> : expr1
>> | expr2
>> ;
>> expr1
>> : PREFIX_1 expr2 SUFFIX
>> ;
>> expr2
>> : PREFIX_2
>> | PREFIX_2 SUFFIX
>> ;
>>
>> PREFIX_1 : 'prefix_1';
>> PREFIX_2 : 'prefix_2';
>> SUFFIX : 'suffix';
>> WS : (' ' | '\r' | '\n' | '\t')+ {$channel=HIDDEN;};
>>
>>
>> Please do not advise how to fix it :) I know that but the question is why
>> ANTLR considers rule for expr2 ambiguous? Here is a tool run:
>>
>> java org.antlr.Tool Ambiguous.g
>> warning(200): Ambiguous.g:11:5: Decision can match input such as
>> "PREFIX_2 {EOF, SUFFIX}" using multiple alternatives: 1, 2
>> As a result, alternative(s) 2 were disabled for that input
>> error(201): Ambiguous.g:11:5: The following alternatives can never be
>> matched: 2
>>
> Clearly, this is ambiguous because you follow expr2 with SUFFIX in the
> expr1 rule. When parsing expr2 as part of expr1, the paths of taking
> /PREFIX_2 the using SUFFIX in expr1/ and taking PREFIX_2 then SUFFIX
> in expr2 followed by SUFFIX in expr1 are ambiguous as written.
>
> You say you know how to fix it, but perhaps you do not if you cannot
> see the ambiguity here. Because expr2 is a stand alone path from expr,
> there is no way to know how to disambiguate when called from expr1.
> Basically, you need to left factor these rules properly:
>
> expr
> : PREFIX_1 PREFIX_2 SUFFIX SUFFIX?
> | PREFIX_2 SUFFIX?
> ;
>
>
>
> Jim
>
>
> ------------------------------------------------------------------------
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
By definition a grammar is ambiguous if there is more than 1 way to
derive at least one statement.
This is not the case here. There is only 1 way to derive the offending
statement
PREFIX_1 PREFIX_2 SUFFIX SUFFIX
And it would take the procedure for expr2 just about 3 tokens lookahead
to figure out what is the
right thing to do. The question is why ANTLR does not do this?
Regards
Nikolay
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090813/c7ca1577/attachment.html
More information about the antlr-interest
mailing list