[antlr-interest] The Classic else (Non-)Ambiguity

Ron Burk ronburk at gmail.com
Tue May 4 09:00:04 PDT 2010


> Sure - the way we normally resolve the ambiguity is (probably) the
> only reasonable way to do so. The issue is really who makes that
> decision.

No, the issue is that ANTLR is complaining about a conflict between
two choices, one of which produces a parser that is broken, that
cannot accept languages that conform to the grammar specification at
all. I really can't tell if you understand that choosing the
alternative that ANTLR decided not to choose (the "conflict" it is
actually complaining about, rather than the grammar ambiguity) results
in a parser that cannot recognize any "else" statement at all and is
therefore completely broken.

I think from what you're saying that you imagine ANTLR is complaining
about the grammar ambiguity rather than a conflict in constructing
tables. It does not have a grammar ambiguity detector.

>  It's far too easy to make mistakes when writing rules;
> grammars are not easy!

That's true. And it's made even harder by tools that claim the problem
is they couldn't choose between A and B, where B would produce a
broken result that is completely illegal (not a matter of taste or
choice, but a complete failure to recognize the specified language)
and violates the entire premise of the tool. The warning you think is
informing users is actually misleading them. The grammar ambiguity
does not arise from any different possibilities of how else could
bind, it arises from non-deterministic behavior, which nobody using a
tool that creates deterministic parsers could ever want.

> It doesn't matter _how_ such a feature could be implemented;

Actually, I think if you did try to implement such a feature, you
would begin to grasp why the ambiguity formed by N? N? will always
have only one deterministic parser implementation that doesn't violate
the grammar. Happily, it is also INVARIABLY exactly what the human
writing the grammar intended.

But at least seeing that the unambiguous grammar that represents the
other choice bears no resemblance to this one should help make it
clear that parser generators report parser table conflicts, not
grammar ambiguities. The conflict being reported in this case does not
at all correspond between two different ambiguous interpretations of
the input grammar. It corresponds to the only deterministic choice for
this grammar, or producing a broken parser that no longer accepts the
language.

And, if you think carefully (trying to root out of your brain decades
of repetition of the same wrong explanation of what's going on in this
case), you'll see that at the grammar level (high-level, abstract
specification), the ambiguity is one of non-determinism. Is it true
that the grammar I gave can derive a sentence in which else binds to
the outermost if? Absolutely. How does it do that? By taking one
ambiguous branch on one occasion and the other ambiguous branch on
another occasion! Something that no deterministic parser can do.
Something that never can make sense to a human ("yeah, my language
accepts else statements sometimes, but other times it just doesn't
feel like it.").

So, the ambiguity has nothing to do with how else "binds" and it is
not the least bit informative to claim it does to users -- this
grammar gives no clue whatsoever how to create an unambiguous grammar
that produces a different "binding" for else, because that is a
completely unrelated problem. No, the ambiguity is of the specific
form N? N?, which always has exactly one meaning to humans, and
exactly one deterministic implementation that does not violate the
grammar specification. In essence, the ambiguity is a choice between
non-deterministic behavior and deterministic behavior.

The nature of the if-else grammar ambiguity is identical to that of
the "digits" example I gave in the original post. When two optional
instances of the same non-terminal are adjacent, a non-deterministic
parser could magically decide to NOT match the next digit to the first
optional non-terminal, then magically make the OPPOSITE choice for the
second optional non-terminal. That is the source of the ambiguity.
Magical behavior that humans don't want and deterministic tools cannot
implement.

The non-deterministic alternative (eat an else sometimes, other times
magically decide not to in the exact same situation) does not
correspond to a different "binding"  for else. It is not generally of
any use to humans. It is not something a deterministic parser
generator can create. Humans do not select a tool for creating
deterministic parsers and ever expect it to produce non-deterministic
results.

When deterministic parser generators encounter this precise form of
grammar ambiguity, they arrive at a "conflict" between two choices.
One choice is the only deterministic solution to the grammar
specification. The other choice completely discards some legal syntax,
resulting in a parser that is flat-out broken. There is never any good
reason to warn the user that your deterministic parser generator has
decided to create a deterministic parser instead of a
non-deterministic one. But yeah, if you want to claim that a warning
message must be emitted, then the correct one is something like: "This
grammar has a non-deterministic interpretation; I chose the
deterministic one, the only choice I am capable of making, the only
choice that makes any sense to humans."

IMHO.

======== ifelse.lex  ==============
%option noyywrap
%%
if  { return IF; }
else    { return ELSE; }

[0-9]+    { return cond; }

\{\}        { return STMT; }


[ \t\n]*   ;

.       { return yytext[0]; }
============ ifelse.y ============
/* force yacc to create a broken parser by choosing
 * the non-default choice for an if-else "ambiguity"
 */
%token IF ELSE cond STMT

%left ELSE
%left IF

%%

stat    :       IF  '('  cond  ')'  stat %prec IF
    { printf("Got an If with no else!\n"); }
        |       IF  '('  cond  ')'  stat  ELSE  stat %prec IF
    { printf("Got an If with else!\n"); }
        |       STMT
;

%%
#include "lex.yy.c"

void main(void)
    {
    yydebug = 1;
    yyparse();
    }

int  yyerror (char* s)
    {
    fprintf (stderr, "%s\n", s);
    }
============ output  ============
ronburk at alpha ~/yacc $ ./a.out
Starting parse
Entering state 0
Reading a token: if(0)if(1)else
Next token is token IF ()
Shifting token IF ()
Entering state 1
Reading a token: Next token is token '(' ()
Shifting token '(' ()
Entering state 4
Reading a token: Next token is token cond ()
Shifting token cond ()
Entering state 6
Reading a token: Next token is token ')' ()
Shifting token ')' ()
Entering state 7
Reading a token: Next token is token IF ()
Shifting token IF ()
Entering state 1
Reading a token: Next token is token '(' ()
Shifting token '(' ()
Entering state 4
Reading a token: Next token is token cond ()
Shifting token cond ()
Entering state 6
Reading a token: Next token is token ')' ()
Shifting token ')' ()
Entering state 7
Reading a token: Next token is token ELSE ()
syntax error, unexpected ELSE, expecting IF or STMT
Error: popping token ')' ()
Stack now 0 1 4 6 7 1 4 6
Error: popping token cond ()
Stack now 0 1 4 6 7 1 4
Error: popping token '(' ()
Stack now 0 1 4 6 7 1
Error: popping token IF ()
Stack now 0 1 4 6 7
Error: popping token ')' ()
Stack now 0 1 4 6
Error: popping token cond ()
Stack now 0 1 4
Error: popping token '(' ()
Stack now 0 1
Error: popping token IF ()
Stack now 0
Cleanup: discarding lookahead token ELSE ()
Stack now 0


More information about the antlr-interest mailing list