[antlr-interest] Syntatic Predicate vs Backtracking Optimizations; Why are Syntactic Predicates always Evaluated?
chris king
kingces95 at gmail.com
Wed Aug 24 12:41:09 PDT 2011
I'm working to resolve a C# grammar ambiguity in my
class_member_declarationrule arising because (1) all members are
prefixed with custom attributes (a
recursive rule) and (2) many productions are prefixed with a type (recursive
due to generics). I could solve the first by left-factoring but that's not
as readable. The second would be a super pain to left-factor. So like the
reference says, I solve these problems using backtracking and syntactic
predicates (Chapter 12). And those techniques do solve the problem.
My issue is with the optimizations. Backtracking optimizes the common case
(no custom attributes, non-generic types) by sticking a DFA in front of the
implicit syntactic predicate so those decisions that can be made via regex
(trained monkey). This is the "Java For-Loop" (p. 300) optimization. However
for the hard case backtracking tries to match the entire rule and some rules
like method_declaration have a body (nested_type_declaration is even worse).
This is the "C Function Definition vs. Declaration" ambiguity and Tar's
advice here is not to use backtracking but instead use a Syntactic Predicate
(p. 303, "Although you could simply turn on the auto-backtracking feature,
that is unnecessarily inefficient because the parser will backtrack over the
entire function body").
So to optimize the hard case I use a syntactic predicate and specify just
the prefix for each class_member_declaration rule necessary to choose that
alternative (again p. 303, "A more efficient approach is to use a manually
specified syntactic predicate that provides the minimum necessary lookahead
to distinguish the first alternative from the second."). That optimizes the
hard case well; I don't match method bodies during backtracking. However now
the common case is slow because explicitly specified syntactic predicates
are always evaluated; no DFA; no trained monkey; (p. 302 "Manually specified
syntactic predicates are always evaluated, but those implicitly added by
auto-backtracking mode are not.").
So now I'm feeling a bit stuck. I got a solution but can only optimize the
common case using backtracking or the hard case (or just slightly less
common) using syntactic predicates. So I guess my question is: *Why are
syntactic predicates are always evaluated when a DFA monkey could have
predicted the alternative? *
Thanks,
Chris
p.s.
To make sure my understanding is correct I have some traces. I share them
here for posterity (i.e. Google). For simplicity I only include
class_member_declaration
rules that are ambiguous because of the leading attribute_section -- not
type which I have restricted to only match non-recursive non-generic types.
My common case input is "const int c = 1;" and my hard case input is
"[Foo]const
int c = 1;". Here is the grammar for the back tracking solution:
public
unattributed_class_member_declaration
options { backtrack=true; }
: constant_declaration
| event_declaration
constant_declaration
: attribute_section* CONST type constant_declarators SEMI_COLON
;
event_declaration
: attribute_section* EVENT type variable_declarators SEMI_COLON
;
attribute_section // is recursive
options { memoize=true; }
: OPEN_SQUARE_BRACKET attribute_target_specifier? attribute_list COMMA?
CLOSE_SQUARE_BRACKET
;
And here is the trace for the common case (it's optimized in that there are
no invocation of syntactic predicates -- pure DFA monkey).
Enter class_member_declaration 49
[@-1,0:4='const',<31>,1:0]
Enter constant_declaration 50
[@-1,6:8='int',<108>,1:6]
Enter type 26
Enter named_type 27
Enter named_value_type 29
Enter named_numeric_type 30
Enter named_integral_type 31
[@-1,10:10='c',<104>,1:10]
Leave named_integral_type 31
Leave named_numeric_type 30
Leave named_value_type 29
Leave named_type 27
Leave type 26
Enter constant_declarators 52
Enter constant_declarator 53
[@-1,12:12='=',<58>,1:12]
[@-1,14:14='1',<37>,1:14]
Enter constant_expression 203
Enter expression 201
...
Enter literal 113
Enter integer_literal 114
[@-1,15:15=';',<158>,1:15]
Leave integer_literal 114
Leave literal 113
...
Leave expression 201
Leave constant_expression 203
Leave constant_declarator 53
Leave constant_declarators 52
Leave constant_declaration 50
Leave class_member_declaration 49
Here is the trace for the hard case (it's unoptimized as the backtracking
syntactic predicate [in yellow] is matching tokens past 'const' [in red]) :
Enter class_member_declaration 49
[@-1,0:0='[',<129>,1:0]
[@-1,1:3='Foo',<104>,1:1]
Enter synpred1_CSharp_fragment 308
Enter constant_declaration 50
Enter attribute_section 292
Enter attribute_list 295
Enter attribute 296
Enter attribute_name 297
Enter type_name 15
Enter namespace_or_type_name 16
[@-1,4:4=']',<28>,1:4]
Leave namespace_or_type_name 16
Leave type_name 15
Leave attribute_name 297
Leave attribute 296
Leave attribute_list 295
[@-1,5:9='const',<31>,1:5]
Leave attribute_section 292
[@-1,11:13='int',<108>,1:11]
Enter type 26
Enter named_type 27
Enter named_value_type 29
Enter named_numeric_type 30
Enter named_integral_type 31
[@-1,15:15='c',<104>,1:15]
Leave named_integral_type 31
Leave named_numeric_type 30
Leave named_value_type 29
Leave named_type 27
Leave type 26
Enter constant_declarators 52
Enter constant_declarator 53
[@-1,17:17='=',<58>,1:17]
[@-1,19:19='1',<37>,1:19]
Enter constant_expression 203
Enter expression 201
...
Enter literal 113
Enter integer_literal 114
[@-1,20:20=';',<158>,1:20]
Leave integer_literal 114
Leave literal 113
...
Leave constant_declarators 52
Leave constant_declaration 50
Leave synpred1_CSharp_fragment 308
Here is the Syntactic Predicate solution (same as backtracking solution with
the addition of syntactic predicates [yellow]):
public
unattributed_class_member_declaration
options { backtrack=true; }
: (attribute_section* constant_modifier* CONST)=> constant_declaration
| (attribute_section* event_modifier* EVENT)=> event_declaration
constant_declaration
: attribute_section* CONST type constant_declarators SEMI_COLON
;
event_declaration
: attribute_section* EVENT type variable_declarators SEMI_COLON
;
attribute_section // is recursive
options { memoize=true; }
: OPEN_SQUARE_BRACKET attribute_target_specifier? attribute_list COMMA?
CLOSE_SQUARE_BRACKET
;
Here is the trace for the hard case (now optimized as the syntactic
predicate bails soon after it sees 'const'):
Enter class_member_declaration 49
[@-1,0:0='[',<129>,1:0]
[@-1,1:3='Foo',<104>,1:1]
Enter synpred1_CSharp_fragment 308
Enter attribute_section 292
Enter attribute_list 295
Enter attribute 296
Enter attribute_name 297
Enter type_name 15
Enter namespace_or_type_name 16
[@-1,4:4=']',<28>,1:4]
Leave namespace_or_type_name 16
Leave type_name 15
Leave attribute_name 297
Leave attribute 296
Leave attribute_list 295
[@-1,5:9='const',<31>,1:5]
Leave attribute_section 292
[@-1,11:13='int',<108>,1:11]
Leave synpred1_CSharp_fragment 308
And here is the trace for the common case (now unoptimized as a DFA monkey
could have predicted the alternative but instead the syntactic predicate was
invoked):
Enter class_member_declaration 49
[@-1,0:4='const',<31>,1:0]
Enter synpred1_CSharp_fragment 308
[@-1,6:8='int',<108>,1:6]
Leave synpred1_CSharp_fragment 308
More information about the antlr-interest
mailing list