[antlr-interest] Optional keyword causes ambiguity in parser

Ramon Verbruggen Ramon.Verbruggen at quintiq.com
Fri Apr 18 08:55:14 PDT 2008


Gavin,

Thanks for your clear explanation!

> Think of what decisions ANTLR has to make (assuming you've made it 
> optional), and you'll see why it thinks it's ambiguous.
[...]
> But how is it to know whether the 
> Identifier it just saw is a statement or a 
> returnStatement?  Answer: it can't.
>
I did realise there was no way for ANTLR to determine whether an
Identifier was part of a (normal) statement or part of a return
statement, your explanation confirms my line of thought. 
I never meant to question the validity of the warning that was given.

> Now, you've said that you want the "return" keyword to be 
> optional.  One way you could do that would be to permit an 
> expression as a valid statement:
>    statementBody:	statementList returnStatement? EOF;
>    returnStatement: 'return' expression;
>    statementList:	(statement ';'*)*;
>    statement:	expression;
>    expression:	addressable( '*' addressable)*;
>    addressable:	Identifier ( '.' Identifier '()' )*;
>    Identifier:	('a'..'z')+;
> 
> There are two downsides to this, of course; the first is that 
> you've widened the possible inputs (which may not be acceptable), 
> and the second is that it isn't very easy to pick off the final 
> expression statement to give it special handling, if you want to.
>
Unfortunately, both downsides you mention are indeed going to cause
grief...the actual grammar in question is much bigger and we have a
rather big installed base.
 
> Another thing you could do is to remove 'addressable' from the 
> list of possible statements.  So long as anything that can match 
> 'expression' cannot also match 'statement', the ambiguity is gone.
> 
This would also break all our existing code, so unfortunately that one
is out too!

> Yet another thing is to re-examine the parentheses in 
> 'addressable'.  Is 'Identifier' really supposed to be an 
> addressable by itself?  Should the parentheses be outside the loop 
> instead (so that 'foo' isn't an addressable, but 'foo()' is)?
> 
You really looked at the grammar in depth! I agree that it looks silly,
but the point of the 'addressable' is that you can start with a variable
or argument, and use methods and attributes of this variable or argument
to chain them together (e.g.
myarg.SomeGetMethod().AnotherMethod().GetAttribute()) so yes, an
Identifier by itself is a valid addressable.

> If you can't change the input language in this way, then you'll 
> have to resolve the ambiguity with sempreds.  The main one (since 
> it's in a loop) is the 'addressable' in 'statement' -- you have to 
> add a sempred telling it to fail any addressables that should be 
> interpreted as return statements instead.  Presumably they're 
> distinct enough that you can tell the difference.  
> 
Unfortunately they are not distinct at all, the only distinction is
that the last statement in the method must be an expression, optionally
preceded by the 'return' keyword.

> (If you can't tell the difference, then you probably shouldn't be
trying to 
> remove that keyword.)
>
Point taken, I guess I just didn't realise that this relatively small
change to the language actually makes it very hard to parse.

> Beyond these suggestions, I'm not really sure.  I've usually been 
> working with DSLs that I'm in full control of, so I've tended to 
> designed languages that are easy to parse :)
>
Good point! This whole exercise would wreak havoc on a very workable
grammar for a (reasonably) well-defined language.

It may seem as though I've blown off all your suggestions, but you've
actually helped me a lot: I now feel comfortable saying that making the
'return' keyword optional requires some significant changes to the
grammar and (therefore) to our language, which is not at all what I had
in mind when I set out to make the 'return' keyword optional.

Thanks again for your useful input.

Ramon Verbruggen


This message contains information that may be privileged or confidential
and is the property of Quintiq. It is only intended for the person to
whom it is addressed. If you are not the intended recipient, you are not
authorized to read, print, retain, copy, disseminate, distribute or use
this message or any part thereof. If you have received this message in
error, please notify the sender immediately and delete all copies of
this message. Please note that e-mails are susceptible to change,
therefore they are not binding.


More information about the antlr-interest mailing list