[antlr-interest] Syntatic Predicate vs Backtracking Optimizations; Why are Syntactic Predicates always Evaluated?

chris king kingces95 at gmail.com
Wed Aug 24 12:41:09 PDT 2011


I'm working to resolve a C# grammar ambiguity in my
class_member_declarationrule arising because (1) all members are
prefixed with custom attributes (a
recursive rule) and (2) many productions are prefixed with a type (recursive
due to generics). I could solve the first by left-factoring but that's not
as readable. The second would be a super pain to left-factor. So like the
reference says, I solve these problems using backtracking and syntactic
predicates (Chapter 12). And those techniques do solve the problem.

My issue is with the optimizations. Backtracking optimizes the common case
(no custom attributes, non-generic types) by sticking a DFA in front of the
implicit syntactic predicate so those decisions that can be made via regex
(trained monkey). This is the "Java For-Loop" (p. 300) optimization. However
for the hard case backtracking tries to match the entire rule and some rules
like method_declaration have a body (nested_type_declaration is even worse).
This is the "C Function Definition vs. Declaration" ambiguity and Tar's
advice here is not to use backtracking but instead use a Syntactic Predicate
(p. 303, "Although you could simply turn on the auto-backtracking feature,
that is unnecessarily inefficient because the parser will backtrack over the
entire function body").

So to optimize the hard case I use a syntactic predicate and specify just
the prefix for each class_member_declaration rule necessary to choose that
alternative (again p. 303, "A more efficient approach is to use a manually
specified syntactic predicate that provides the minimum necessary lookahead
to distinguish the first alternative from the second."). That optimizes the
hard case well; I don't match method bodies during backtracking. However now
the common case is slow because explicitly specified syntactic predicates
are always evaluated; no DFA; no trained monkey; (p. 302 "Manually specified
syntactic predicates are always evaluated, but those implicitly added by
auto-backtracking mode are not.").

So now I'm feeling a bit stuck. I got a solution but can only optimize the
common case using backtracking or the hard case (or just slightly less
common) using syntactic predicates. So I guess my question is: *Why are
syntactic predicates are always evaluated when a DFA monkey could have
predicted the alternative? *

Thanks,
Chris

p.s.

To make sure my understanding is correct I have some traces. I share them
here for posterity (i.e. Google). For simplicity I only include
class_member_declaration
rules that are ambiguous because of the leading attribute_section -- not
type which I have restricted to only match non-recursive non-generic types.
My common case input is "const int c = 1;" and my hard case input is
"[Foo]const
int c = 1;". Here is the grammar for the back tracking solution:

public
unattributed_class_member_declaration
  options { backtrack=true; }
  : constant_declaration
  | event_declaration

constant_declaration
  : attribute_section* CONST type constant_declarators SEMI_COLON
  ;

event_declaration
  : attribute_section* EVENT type variable_declarators SEMI_COLON
  ;

attribute_section // is recursive
  options { memoize=true; }
  : OPEN_SQUARE_BRACKET attribute_target_specifier? attribute_list COMMA?
CLOSE_SQUARE_BRACKET
  ;

And here is the trace for the common case (it's optimized in that there are
no invocation of syntactic predicates -- pure DFA monkey).

Enter class_member_declaration 49
[@-1,0:4='const',<31>,1:0]
 Enter constant_declaration 50
[@-1,6:8='int',<108>,1:6]
  Enter type 26
   Enter named_type 27
    Enter named_value_type 29
     Enter named_numeric_type 30
      Enter named_integral_type 31
[@-1,10:10='c',<104>,1:10]
      Leave named_integral_type 31
     Leave named_numeric_type 30
    Leave named_value_type 29
   Leave named_type 27
  Leave type 26
  Enter constant_declarators 52
   Enter constant_declarator 53
[@-1,12:12='=',<58>,1:12]
[@-1,14:14='1',<37>,1:14]
    Enter constant_expression 203
     Enter expression 201
       ...
                      Enter literal 113
                       Enter integer_literal 114
[@-1,15:15=';',<158>,1:15]
                       Leave integer_literal 114
                      Leave literal 113
       ...
     Leave expression 201
    Leave constant_expression 203
   Leave constant_declarator 53
  Leave constant_declarators 52
 Leave constant_declaration 50
Leave class_member_declaration 49

Here is the trace for the hard case (it's unoptimized as the backtracking
syntactic predicate [in yellow] is matching tokens past 'const' [in red]) :

Enter class_member_declaration 49
[@-1,0:0='[',<129>,1:0]
[@-1,1:3='Foo',<104>,1:1]
 Enter synpred1_CSharp_fragment 308
  Enter constant_declaration 50
   Enter attribute_section 292
    Enter attribute_list 295
     Enter attribute 296
      Enter attribute_name 297
       Enter type_name 15
        Enter namespace_or_type_name 16
[@-1,4:4=']',<28>,1:4]
        Leave namespace_or_type_name 16
       Leave type_name 15
      Leave attribute_name 297
     Leave attribute 296
    Leave attribute_list 295
[@-1,5:9='const',<31>,1:5]
   Leave attribute_section 292
[@-1,11:13='int',<108>,1:11]
   Enter type 26
    Enter named_type 27
     Enter named_value_type 29
      Enter named_numeric_type 30
       Enter named_integral_type 31
[@-1,15:15='c',<104>,1:15]
       Leave named_integral_type 31
      Leave named_numeric_type 30
     Leave named_value_type 29
    Leave named_type 27
   Leave type 26
   Enter constant_declarators 52
    Enter constant_declarator 53
[@-1,17:17='=',<58>,1:17]
[@-1,19:19='1',<37>,1:19]
     Enter constant_expression 203
      Enter expression 201
       ...
                       Enter literal 113
                        Enter integer_literal 114
[@-1,20:20=';',<158>,1:20]
                        Leave integer_literal 114
                       Leave literal 113
       ...
   Leave constant_declarators 52
  Leave constant_declaration 50
 Leave synpred1_CSharp_fragment 308

Here is the Syntactic Predicate solution (same as backtracking solution with
the addition of syntactic predicates [yellow]):

public
unattributed_class_member_declaration
  options { backtrack=true; }
  : (attribute_section* constant_modifier* CONST)=> constant_declaration
  | (attribute_section* event_modifier* EVENT)=> event_declaration

constant_declaration
  : attribute_section* CONST type constant_declarators SEMI_COLON
  ;

event_declaration
  : attribute_section* EVENT type variable_declarators SEMI_COLON
  ;

attribute_section // is recursive
  options { memoize=true; }
  : OPEN_SQUARE_BRACKET attribute_target_specifier? attribute_list COMMA?
CLOSE_SQUARE_BRACKET
  ;

Here is the trace for the hard case (now optimized as the syntactic
predicate bails soon after it sees 'const'):

Enter class_member_declaration 49
[@-1,0:0='[',<129>,1:0]
[@-1,1:3='Foo',<104>,1:1]
 Enter synpred1_CSharp_fragment 308
  Enter attribute_section 292
   Enter attribute_list 295
    Enter attribute 296
     Enter attribute_name 297
      Enter type_name 15
       Enter namespace_or_type_name 16
[@-1,4:4=']',<28>,1:4]
       Leave namespace_or_type_name 16
      Leave type_name 15
     Leave attribute_name 297
    Leave attribute 296
   Leave attribute_list 295
[@-1,5:9='const',<31>,1:5]
  Leave attribute_section 292
[@-1,11:13='int',<108>,1:11]
 Leave synpred1_CSharp_fragment 308

And here is the trace for the common case (now unoptimized as a DFA monkey
could have predicted the alternative but instead the syntactic predicate was
invoked):

Enter class_member_declaration 49
[@-1,0:4='const',<31>,1:0]
 Enter synpred1_CSharp_fragment 308
[@-1,6:8='int',<108>,1:6]
 Leave synpred1_CSharp_fragment 308


More information about the antlr-interest mailing list