[antlr-interest] The Classic else (Non-)Ambiguity

Mon May 3 11:14:47 PDT 2010

You only have 100% confidence in it because that's how you define the
semantics for the if/else ambiguity for your language. The construct
itself is ambiguous when if/else doesn't require some kind of block
specification (like C/C++/Java).

Some other language could define the semantics to intentionally match
the else to the _outermost_ if. (I think that would be a _really_ bad
idea and confuse a lot of people, but we shouldn't be dictating
language definition.)

That said, we could define a special
"if/else-uses-standard-resolution" option, but isn't that really
similar to being explicit with predicates? Why introduce an explicit
option for one construct that not all languages need?

-- Scott

----------------------------------------
Scott Stanchfield
http://javadude.com

On Mon, May 3, 2010 at 1:56 PM, Ron Burk <ronburk at gmail.com> wrote:
> Something that has been bugging me:
>
> Why do virtually all parser generators produce a warning
> for the classic case of if-else ambiguity? It's only technically
> ambiguous: there is absolutely only one resolution to the
> ambiguity that could possibly make any sense at all,
> so warning the user that an ambiguity exists just
> wastes the users time -- this is a class of ambiguity
> for which the parser generator can make the correct
> choice with 100% confidence.
>
> Indeed, it wastes a huge amount of user time even
> if you only measure the time spent posting and answering
> the same "why is this else ambiguous" question over
> and over again (as a quick Google will show). This is
> not, as is often claimed, a situation where using a
> default of "consuming more input" is "usually what
> you want". This particular example is a case where
> there is only one possible choice that could be right.
>
> Part of the problem is people keep explaining the problem
> utterly incorrectly. For example, the yacc manual goes
> into the if-else ambiguity in great detail, and claims
> that the problem is that the else can bind to either
> an inner or outer if. That is wrong. The else can only
> bind to the inner if.
>
> One can demonstrate that point by taking the example
> if-else grammar from the yacc manual and then using
> %prec to force it to choose to reduce instead of shift
> when it encounters an "else" token. The result is a parser
> that produces a syntax error whenever it sees an "else"
> token -- no "else" token can ever be consumed. There
> were never really two possible choices that could
> make sense, there was never any possibility of matching
> an "else" to an outer if. There is only one *valid* choice
> between the two *technically* ambiguous choices since
> the other alternative is, as the Dragon Book says,
> "surely wrong".
>
> There is no doubt that this particular "ambiguity" can
> be resolved with 100% confidence 100% automatically,
> without wasting endless hours of user time by informing
> them that it was "ambiguous". The only interesting question
> is, what is the general category this case belongs to?
>
> Roughly speaking, I think the category is simply those
> situations where ambiguity arises because, effectively,
> you have the same optional non-terminal adjacent to
> itself:
>        N? N?
>
> For example, in this grammar:
>
> %token IF ELSE EXPR
> %%
>
> stmt
>    : if
>    | EXPR
>    ;
>
> if
>    : IF EXPR stmt else_opt
>    ;
>
> else_opt    /* optional else */
>    : ELSE stmt
>    |
>    ;
>
> The ambiguity could be identified (by either a top-down
> or bottom up parser generator) as occurring in the "if"
> production right after "stmt" has been recognized.
> However, it can be automatically determined that
> the cause is the fact that "stmt" can end with an
> optional else, putting two optional else nonterminals
> adjacent to each other, which matches the previously
> mentioned category -- therefore this is a false
> ambiguity.
>
> One can see the problem simplified here:
>
> %token DIGIT
> %%
> digits
>    : '(' DIGIT digit_opt digit_opt ')'
> digit_opt  /* optional digit */
>    : DIGIT
>    | /* epsilon */
>    ;
>
> Only one choice for the "ambiguity" can possibly be right;
> the alternative is that the parser would never be able to
> match any of the optional digits. It will always be the case
> when you have two adjacent optional non-terminals that
> there is a technically ambiguous choice between either
> consuming input, or rendering illegal a token that obviously
> should be legal at that point. The correct choice is always
> clear, and the alternative could never make sense.
>
> There are many other (at least vaguely related)
> cases where a warning absolutely should be
> issued, of course:
>     N* N*
> for example, or
>    N* N
> or even
>    N? O?
> where FIRST(N) and FIRST(O) intersect (I'm all for
> warning if there's the slightest possibility the user
> isn't going to understand what's going to happen).
> But for this category I've identified as pseudo
> ambiguities, there is no possibility of user confusion,
> there is no actual alternative that could ever
> possibly make any sense. It is a mere curiosity,
> not a true ambiguity.
>
> IMHO, modern parser generators should not be
> wasting enormous amounts of total user time by
> warning about this class of pseudo ambiguities.
> And modern parser generator documentation
> should not repeat the utterly incorrect if-else
> explanation that implies there is a different
> choice that produces anything sensible
> whatsoever.
>
> I feel better now. :-)
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>