[antlr-interest] Disambiguating simple grammar – could anyone help?
Sam Barnett-Cormack
s.barnett-cormack at lancaster.ac.uk
Fri Apr 10 04:43:33 PDT 2009
Tomasz Jastrzebski wrote:
> Hello developers,
> I cannot figure out how to disambiguate the following grammar using
> syntactic predicate, so the range rule takes precedence over the offset
> rule.
> Could anyone advice? I prefer to keep separate range and offset rules
> because of the custom AST tree construction.
> Thanks for any pointers,
I'll take a look...
> grammar test;
> program : (expression)* ;
> expression
> :
> Identifier ((range) => range)?
> | offset
> ;
> range : Integer ('-' Integer)? ;
> offset : ('+' | '-') Integer ;
> Identifier : ('a'..'z')+ ;
> Integer : ('0'..'0')+ ;
> WhiteSpace : (' ' | '\t' | '\r\n' | '\r')+ { $channel=HIDDEN; };
Well, to start with, should the Integer rule be:
Integer : ('0'..'9')+;
There should be no need for disambiguation between range and offset.
They have different start sets. You will presumably have sentences like
foo 10-25 - 10 baz 1-5 bar + 5
Now, while that's hard to read, it shouldn't be hard to parse. The
parser tries to read an expression - it goes for the first alt if it
sees an Identifier, the second if it sees '+' or '-'. Once in the first
alt, if it sees an Integer it does the optional bit, otherwise it exits
and looks for another expression. So, the sentence above is presumably
to be parsed as :
expression(Identifier(foo) range(Integer(10) '-' Integer(25)))
expression(offset('-' Integer(10)))
expression(Identifier(baz) range(Integer(1) '-' Integer(5)))
expression(Identifier(bar))
expression(offset('+' Integer(5)))
Where the problem appears to come is part-way through range. Should:
foo 10 - 15
Be:
expression(Identifier(foo) range(Integer(10) '-' Integer(15)))
or:
expression(Identifier(foo) range(Integer(10)))
expression(offset('-' Integer(15)))
The way you've described the language, that's an inherent ambiguity.
Now, if you want it to be interpreted as a range whenever possible, you
need to deal with the ambiguity on the exit set of range (I think), so
the predicate needs to be there. Lose the predicate from the expression
rule and put it in range:
range : Integer (('-')=>('-' Integer)|) ;
This should prevent range from exiting prematurely. However, if you're
designing this language, I'd recommend getting rid of the inherent
ambiguity by putting in an expression terminator or requiring
start-and-end markers for range (parentheses, maybe), or even not using
'-' for ranges - use '..' instead.
Now, I've not tested this, and might've made a mistake, but I think that
I've got the problem identified there.
--
Sam Barnett-Cormack
More information about the antlr-interest
mailing list