[antlr-interest] Semantic predicates

David-Sarah Hopwood david-sarah at jacaranda.org
Mon Oct 19 18:09:55 PDT 2009


Jim Idle wrote:
> I don’t think that there any bugs here. The gated predicates work fine for me
> (not that that means there are therefore no bugs ;-), but that it is more
> likely there is a misuse going on somewhere in the grammar).

Having investigated further, there is certainly very strange and error-prone
behaviour for examples similar to Robert Wentworth's, whether or not it is
considered to be an ANTLR bug. Consider the following grammar, which is a
simplified excerpt of an ECMAScript 5 grammar (it parses property
assignments in ES5 object literals):

  grammar Test;

  options {
    language = Java;
    backtrack = true;
  }

  @parser::members {
    /** Test whether token is not null and has the given text. */
    protected boolean is(Token token, String string) {
      return token != null && token.getText().equals(string);
    }
  }

  COLON : ':';
  ID    : ('a'..'z')+;
  get   : { is(input.LT(1), "get") }?=> ID;
  set   : { is(input.LT(1), "set") }?=> ID;
  WS    : ' ' { $channel = HIDDEN; };

  top : propertyAssignment EOF;

  propertyAssignment
    : ID COLON { System.out.println("ID COLON"); }
    | get ID   { System.out.println("get ID"); }
    | set ID   { System.out.println("set ID"); }
    ;

(Note that "get" and "set" are intended to be treated as identifiers
in all contexts except for the second and third alternatives of
propertyAssignment. Yes, I know that there are other approaches to
handling such context-dependent identifiers.)

The output of this grammar on various inputs is:

"a:"
ID COLON [as intended]

"get:"
ID COLON [as intended]

"set:"
ID COLON [as intended]

"get a"
line 1:0 no viable alternative at input 'get' [not as intended]

"set a"
set ID [as intended! why does this work when "get a" doesn't?]


Strangely, "get a" does print "get ID" when the 'backtrack' option is
set to false.

Note that manually hoisting the predicates into propertyAssignment
still doesn't work:

  propertyAssignment
    : ID COLON { System.out.println("ID COLON"); }
    | { is(input.LT(1), "get") }?=> ID ID { System.out.println("get ID"); }
    | { is(input.LT(1), "set") }?=> ID ID { System.out.println("set ID"); }
    ;

In any case, we have an apparently working grammar that compiles without
warnings, but fails when backtracking is enabled. This certainly violates
the Principle of Least Astonishment.

As far as I can see, the generated code for propertyAssignment (shown here
for the case when the predicates are manually hoisted) is simply wrong:

  // T:\\jacaranda\\old\\Test.g:25:5: ( ID COLON | {...}? => ID ID | {...}?
=> ID ID )
  int alt1=3;
  int LA1_0 = input.LA(1);

  if ( (LA1_0==ID) ) {
    int LA1_1 = input.LA(2);

    if ( (LA1_1==COLON) ) {
      alt1=1;
    }
    else if ( (LA1_1==ID) && (( is(input.LT(1), "set") ))) {
      int LA1_3 = input.LA(3);

      if ( ((synpred2_Test()&&( is(input.LT(1), "get") ))) ) {
        alt1=2;
      }
      else if ( (( is(input.LT(1), "set") )) ) {
        alt1=3;
      }
      else {
...

Notice that the predicate for "get" is only tested inside the 'if'
statement that has already checked for "set".

Without backtracking, the condition for that 'if' statement is
"(LA1_1==ID) && ((( is(input.LT(1), "get") )||( is(input.LT(1), "set") ))))"
which looks correct to me (even if it is somewhat inefficient to be
evaluating the predicates multiple times, rather than assigning them
to temporaries).

I.e. the bug *appears* to be that, when backtracking is enabled for a
rule and there is more than one gated semantic predicate after hoisting,
not all of the predicates are taken into account when generating some of
the conditions for predicting the alternative.

Note that use of syntactic predicates implicitly enables non-global
backtracking, so it is plausible that similar problems could also occur
when there is more than one syntactic predicate after hoisting, even if
the grammar does not use the global 'backtrack = true' option. I didn't
investigate that case, though.

-- 
David-Sarah Hopwood  ⚥  http://davidsarah.livejournal.com



More information about the antlr-interest mailing list