[antlr-interest] Disambiguating simple grammar – could anyone help?

Sam Barnett-Cormack s.barnett-cormack at lancaster.ac.uk
Fri Apr 10 04:43:33 PDT 2009


Tomasz Jastrzebski wrote:
> Hello developers,
> I cannot figure out how to disambiguate the following grammar using 
> syntactic predicate, so the range rule takes precedence over the offset 
> rule.
> Could anyone advice? I prefer to keep separate range and offset rules 
> because of the custom AST tree construction.
> Thanks for any pointers,

I'll take a look...

> grammar test;
> program : (expression)* ;
> expression
>  :
>    Identifier ((range) => range)?
>  | offset
>  ;
> range : Integer ('-' Integer)? ;
> offset : ('+' | '-') Integer ;
> Identifier : ('a'..'z')+ ;
> Integer : ('0'..'0')+ ;
> WhiteSpace : (' ' | '\t' | '\r\n' | '\r')+ { $channel=HIDDEN; };

Well, to start with, should the Integer rule be:

Integer : ('0'..'9')+;

There should be no need for disambiguation between range and offset. 
They have different start sets. You will presumably have sentences like

foo 10-25 - 10 baz 1-5 bar + 5

Now, while that's hard to read, it shouldn't be hard to parse. The 
parser tries to read an expression - it goes for the first alt if it 
sees an Identifier, the second if it sees  '+' or '-'. Once in the first 
alt, if it sees an Integer it does the optional bit, otherwise it exits 
and looks for another expression. So, the sentence above is presumably 
to be parsed as :

expression(Identifier(foo) range(Integer(10) '-' Integer(25)))
expression(offset('-' Integer(10)))
expression(Identifier(baz) range(Integer(1) '-' Integer(5)))
expression(Identifier(bar))
expression(offset('+' Integer(5)))

Where the problem appears to come is part-way through range. Should:

foo 10 - 15

Be:

expression(Identifier(foo) range(Integer(10) '-' Integer(15)))

or:

expression(Identifier(foo) range(Integer(10)))
expression(offset('-' Integer(15)))

The way you've described the language, that's an inherent ambiguity. 
Now, if you want it to be interpreted as a range whenever possible, you 
need to deal with the ambiguity on the exit set of range (I think), so 
the predicate needs to be there. Lose the predicate from the expression 
rule and put it in range:

range : Integer (('-')=>('-' Integer)|) ;

This should prevent range from exiting prematurely. However, if you're 
designing this language, I'd recommend getting rid of the inherent 
ambiguity by putting in an expression terminator or requiring 
start-and-end markers for range (parentheses, maybe), or even not using 
'-' for ranges - use '..' instead.

Now, I've not tested this, and might've made a mistake, but I think that 
I've got the problem identified there.

-- 
Sam Barnett-Cormack


More information about the antlr-interest mailing list