[antlr-interest] Why and how exactly does ANTLR manage to fail on non recursive grammar for finite language?

David-Sarah Hopwood david-sarah at jacaranda.org
Wed Aug 12 19:24:47 PDT 2009


Kevin J. Cummings wrote:
> On 08/12/2009 05:45 PM, Nikolay Ognyanov wrote:
>> Hi everybody,
>>
>> Please have a look at a toy grammar for  language consisting of 2 
>> statements :
>>
>> grammar Ambiguous;
>>
>> expr
>>     : expr1
>>     | expr2
>>     ;
>> expr1
>>     : PREFIX_1 expr2 SUFFIX
>>     ;
>> expr2
>>     : PREFIX_2
>>     | PREFIX_2 SUFFIX
>>     ;
>>
>> PREFIX_1   : 'prefix_1';
>> PREFIX_2   : 'prefix_2';
>> SUFFIX     : 'suffix';
>> WS         : (' ' | '\r' | '\n' | '\t')+ {$channel=HIDDEN;};
>>
>>
>> Please do not advise how to fix it :) I know that but the question is why
>> ANTLR considers rule for expr2 ambiguous?  Here is a tool run:
> 
> Because at the time it is in expr2, it is ambiguous as to whether it
> should match PREFIX_2 and exit, or match PREFIX_2 SUFFIX and then exit.

Yes.

> You only have 1 token of lookahead (by default).

That's not right. The value of k is normally determined automatically
per rule (if not overridden). The problem here is that the grammar is
ambiguous *as an LL(k) grammar* for any k. Although ANTLR supports
backtracking (which makes it more powerful than LL(k)), that's not
enabled by default, and so is not taken into account by the ambiguity
warning.

For example, consider input starting with "prefix_1 prefix_2 suffix".
A non-backtracking LL(k) parser must predict and commit to a particular
alternative of expr2 by looking only at tokens that would be part of that
alternative, here either "prefix_2" or "prefix_2 suffix". It will never
look further ahead than that, regardless of k.

The default of not enabling backtracking is the right default, since the
ambiguity warnings are more conservative and more likely to help in
finding grammar errors. (To me this is more important than the loss
of efficiency that can result from backtracking.) In this case, the
left-factoring suggested by Jim Idle is a better solution than enabling
backtracking, even though the latter would handle this particular
grammar automatically.

-- 
David-Sarah Hopwood  ⚥  http://davidsarah.livejournal.com



More information about the antlr-interest mailing list