[antlr-interest] guessing w/ predicates

Nigel Sheridan-Smith nbsherid at secsme.org.au
Thu Jan 27 12:35:48 PST 2005


> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Bryan Ewbank
> Sent: Friday, 28 January 2005 3:31 AM
> To: ANTLR Interest
> Subject: [antlr-interest] guessing w/ predicates
> 
> Hi Folks,
> 
> I'm seeing something during the guessing phase of tree walkers that
> has me puzzled.
> 
> If I write a rule that has syntactic predicates for *every*
> alternative, it seems that guessing would pass if one of the
> predicates matched - there's no need for further exploration.
> 
> For example:
> 
> a
>   : (b) => b
>   | c
>   ;
> 
> b
>   : (X) => #(X ( a )* )
>   | (Y) => #(Y ( a )* )
>   | (Z) => #(Z ( a )* )
>   ;
> 
> c
>   : bad:. { err << "illegal node: " << bad; }
>   ;
> 
> I would expect that guessing from production "a", when drilling into "b",
> would
> just check the syntactic predicates for "b" - that is, X|Y|Z, and then
> succeed.
> 
> What I'm seeing is that it checks X, then drills down into the
> production to see if "a" matches recursively -- this seems
> counterintuitive (to me, anyway) because "b" requires going into the
> production when the predicate matches.  If guessing goes into the
> production after the predicate, and the production fails, then
> guessing will reject that path incorrectly.
> 
> I rewrote "a" to be what I expected to happen:
> 
> a
>    : ( X | Y | Z ) => b
>    | c
>    ;
> 
> but this is a maintenance problem because now X, Y, and Z occur in two
> predicates.
> 
> Do I misunderstand the purpose of syntactic predicates?

Go have a look at the code that is generated, and this will help you
understand what is going on.

With or without syntactic predicates, ANTLR will still try to follow the
rules. So seeing X, Y or Z will always go to alternative 'b'. If there is an
ambiguity or lack of enough look-ahead, then the first alternative that
matches is chosen greedily.

When you add a syntactic predicate, the alternatives are still tested in the
same order (so it only really makes sense to put the syntactic predicate on
the first one or two alternatives). However, the predicate simply tests
whether this alternative is suitable before continuing. The lexer/parser is
switched into "guessing" mode and the syntactic predicate tested. If the
predicate is found to be true, then "guessing" mode is turned off, the input
stream is rewound to the beginning again, and the alternative with the
tested syntactic predicate is then followed to consume the input.

For example, with k=1,

a: A A
 | A B;

The first alternative is always chosen... if "A B" is seen then an error is
reported. Alternatively, we can add the syntactic predicate:

a: (A A) => A A
 | A B;

In this case, if the token found is an 'A', then select alternative 1,
switch into "guessing" mode and consume the two 'A' tokens - if the two 'A'
tokens are found, then the predicate is true, so rewind the input and
execute alternative 1 without "guessing". If the predicate is false, then it
tries the next alternative (possibly with another syntactic predicate).

The syntactic predicate can contain tokens and other rules. If you specify
tokens, then the code generated is something like:

	// Switch to guessing mode and try out the syntactic predicate
guessing = true;
	boolean synPred001 = true;
	int x = mark ();

	try {
		consume (A);
		consume (A);

	} catch (ANTLRException e) {
		synPred001 = false;
		guessing = false;
	}

	// Rewind the input stream
	rewind (x);

	// If the syntactic predicate is true, then carry out the
alternative
	if (synPred001)
	{
		consume (A);
		consume (A);

	} else {
		// Try some other alternative
		...
	}

If you use rules in your syntactic predicate, then actual rules will be
called to test out the predicate:

	// Switch to guessing mode and try out the syntactic predicate
	guessing = true;
	boolean synPred001 = true;
	int x = mark ();

	try {
		consume (A);
		mySubrule();

	} catch (ANTLRException e) {
		synPred001 = false;
		guessing = false;
	}

	// Rewind the input stream 
	rewind (x);

	// If the syntactic predicate is true, then carry out the
alternative
	if (synPred001)
	{
		consume (A);
		mySubrule();

	} else {
		// Try some other alternative
		...
	}


However, in your example, it looks like you are trying to add the syntactic
predicate to rule 'a' because there is non-determinism between rules 'b' and
'c' that affects rule 'a'. Unfortunately, you pretty much have to put the
syntactic predicate in rule 'a' but you only need as much input as required
to disambiguate the alternatives. So in this case:

 a: (X X | Y Y) => b
  | c;
 
 b: X X
  | Y Y
  | Z Z;
 
 c: X A
  | Y B;

You can rearrange the ordering of the alternatives to simplify the syntactic
predicates as well, if this makes sense in your particular case.

Hope this helps!

Cheers

Nigel

--
Nigel Sheridan-Smith
PhD research student

Faculty of Engineering
University of Technology, Sydney
Phone: 02 9514 7946
Fax: 02 9514 2435




More information about the antlr-interest mailing list