[antlr-interest] The Classic else (Non-)Ambiguity

Tue May 4 14:32:50 PDT 2010

2010/5/4 Chris verBurg <cheetomonster at gmail.com>:
>
> I'm not sure my comments would be useful amongst all the parsing-theory
> heavyweights on this list, but here goes.  :)
>

That's not me ;-)

> 2010/5/4 Marcin Rzeźnicki <marcin.rzeznicki at gmail.com>
>>
>> On Tue, May 4, 2010 at 4:03 PM, Ron Burk <ronburk at gmail.com> wrote:
>>
>> Hi, I've been following this thread and I must say that I've been
>> scratching my head thinking about what you have been attempting to
>> prove.
>
> After my own bout of head-scratching, I think I finally realize what Ron's
> assertion is.  A parser that binds the "else" to the outermost "if" is
> unable to handle a rather simple case:
>     if A then
>         if B then
>             ...
>         else
>             ...
>     else
>         ...
> If that first "else" bound to the first "if", then the second "else" is a
> syntax error -- it's just sitting out in la-la land without an "if" to
> attach to.  In this case, the first "else" has to bind to the last "if".
>  Ron's assertion is that if you generalize that as the rule, then "if-else"
> isn't really ambiguous.
>

That's what backtracking is for IMO - yet parser needs to make local
decision at some point and no fixed lookahead is sure to be
sufficient. While there is more than one decision to be made
well-behaved parser generator should warn user. And no amount of
"we-all-know-what-we-mean-this-is-simple-if-statement" is appropriate
here because, as I have shown, there are other problems of this kind
not in this category. I think that the OP has been so eager to reason
about this case, because he is so strongly confident in what if-else
should be that he has forgotten that it is not the only problem in
language parsing. If the language you are parsing is not so idiomatic
as mine then one tends to be grateful for warnings

>> No. this is not specific category. This is very broad category of
>> conflicts. Let me show you an example of exactly the same conflict:
>> Expression can have form : {ID : type } expression
>> and expressions can be chained together without any delimiter
>> Then parser must issue a warning that  {a : INTEGER } a + b can be
>> seen as ( + ({a : INTEGER} a) b) or ({a: INTEGER} (+a b)).
>> This is exactly the same conflict yet even you, without knowing what
>> language I am talking about, cannot give the resolution. How can you
>> expect parser to give you any?
>
> I may be misreading your example here, and if so please correct me:
> Your expression ("{a: INTEGER} a+ b") could also be written as "{a: INTEGER}
> a + {b: INTEGER} b", and the only valid parse for that is "(+ ({a: INTEGER}
> a) ({b: INTEGER} b))".  Thus the correct parse for your original expression
> is your first one, "(+ ({a: INTEGER} a) (b))".

hehe, the thing is that I am not sure, I'll have to check in the
standard, but I strongly believe the other interpretation is correct,
at least that's what my parser does at these points. if what you
showed was right then it would be problematic in:
{a:INTEGER}{b:INTEGER} a*b + 1000 as you would have to interpret this
as two expressions ({a:INTEGER}<empty>)(+ etc.) and meaning of this
construct is intended to set scope where some property of 'a' holds,
so having empty scopes is pointless. Anyway, that is not the point.
What I am trying to show is that a) "the correct parse for your
original expression" is non-existent because there are two "correct"
ways (syntactically, of course only one is correct semantically), thus
ambiguity - which ANTLR warned me about and I came to thinking  b) if
language is not widely understood and C-like then having your tool to
match blindly "obvious" behavior of if-then is pointless and error
prone c) making special case for if-then-else is not elegant.

-- 
Greetings
Marcin Rzeźnicki