[antlr-interest] Syntatic Predicate vs Backtracking Optimizations; Why are Syntactic Predicates always Evaluated?

Jim Idle jimi at temporal-wave.com
Wed Aug 24 13:03:51 PDT 2011


The only good way to do this is via left factoring. Left factoring the
common prefixes to declarations is in fact more readable not less. You
also allow ALL the possible types and declarations and then report any
erroneous ones. This all works perfectly in my C# 3.5 grammars for
instance.

C function vs prototyping is best solved by left factoring as is basically
anything else that can be left factored to be honest. All backtracking can
do is try and parser every option in order until one works that it has not
tried before. Some syntactic predicates can be removed by ANTLR but
generally they are not. If you feel that they should be then it would
generally indicate that you can refactor your grammar. Sometimes the
reasons that they are required are for decisions far away.

Generally if you need backtracking or predicates using more than a token
or two or three then you would have a case for refactoring. I have only
used backtracking for a VBScript grammar where because the syntax is very
stupid and it is interpreted, there are some ambiguities that can only be
resolved by pseudo executing it, which is what the backtracking mode fakes
for me.

I know it can be a pain if you have typed in 200 rules and realize that
you now have to refactor, but it is well worth it if you need performance
and good error messages.

I can't comment on your specific cases without seeing the grammar and as I
sell a C# grammar, then it is best I don't see yours ;)


Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of chris king
> Sent: Wednesday, August 24, 2011 12:41 PM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] Syntatic Predicate vs Backtracking
> Optimizations; Why are Syntactic Predicates always Evaluated?
>
> I'm working to resolve a C# grammar ambiguity in my
> class_member_declarationrule arising because (1) all members are
> prefixed with custom attributes (a recursive rule) and (2) many
> productions are prefixed with a type (recursive due to generics). I
> could solve the first by left-factoring but that's not as readable. The
> second would be a super pain to left-factor. So like the reference
> says, I solve these problems using backtracking and syntactic
> predicates (Chapter 12). And those techniques do solve the problem.
>
> My issue is with the optimizations. Backtracking optimizes the common
> case (no custom attributes, non-generic types) by sticking a DFA in
> front of the implicit syntactic predicate so those decisions that can
> be made via regex (trained monkey). This is the "Java For-Loop" (p.
> 300) optimization. However for the hard case backtracking tries to
> match the entire rule and some rules like method_declaration have a
> body (nested_type_declaration is even worse).
> This is the "C Function Definition vs. Declaration" ambiguity and Tar's
> advice here is not to use backtracking but instead use a Syntactic
> Predicate (p. 303, "Although you could simply turn on the auto-
> backtracking feature, that is unnecessarily inefficient because the
> parser will backtrack over the entire function body").
>
> So to optimize the hard case I use a syntactic predicate and specify
> just the prefix for each class_member_declaration rule necessary to
> choose that alternative (again p. 303, "A more efficient approach is to
> use a manually specified syntactic predicate that provides the minimum
> necessary lookahead to distinguish the first alternative from the
> second."). That optimizes the hard case well; I don't match method
> bodies during backtracking. However now the common case is slow because
> explicitly specified syntactic predicates are always evaluated; no DFA;
> no trained monkey; (p. 302 "Manually specified syntactic predicates are
> always evaluated, but those implicitly added by auto-backtracking mode
> are not.").
>
> So now I'm feeling a bit stuck. I got a solution but can only optimize
> the common case using backtracking or the hard case (or just slightly
> less
> common) using syntactic predicates. So I guess my question is: *Why are
> syntactic predicates are always evaluated when a DFA monkey could have
> predicted the alternative? *
>
> Thanks,
> Chris
>
> p.s.
>
> To make sure my understanding is correct I have some traces. I share
> them here for posterity (i.e. Google). For simplicity I only include
> class_member_declaration rules that are ambiguous because of the
> leading attribute_section -- not type which I have restricted to only
> match non-recursive non-generic types.
> My common case input is "const int c = 1;" and my hard case input is
> "[Foo]const int c = 1;". Here is the grammar for the back tracking
> solution:
>
> public
> unattributed_class_member_declaration
>   options { backtrack=true; }
>   : constant_declaration
>   | event_declaration
>
> constant_declaration
>   : attribute_section* CONST type constant_declarators SEMI_COLON
>   ;
>
> event_declaration
>   : attribute_section* EVENT type variable_declarators SEMI_COLON
>   ;
>
> attribute_section // is recursive
>   options { memoize=true; }
>   : OPEN_SQUARE_BRACKET attribute_target_specifier? attribute_list
> COMMA?
> CLOSE_SQUARE_BRACKET
>   ;
>
> And here is the trace for the common case (it's optimized in that there
> are no invocation of syntactic predicates -- pure DFA monkey).
>
> Enter class_member_declaration 49
> [@-1,0:4='const',<31>,1:0]
>  Enter constant_declaration 50
> [@-1,6:8='int',<108>,1:6]
>   Enter type 26
>    Enter named_type 27
>     Enter named_value_type 29
>      Enter named_numeric_type 30
>       Enter named_integral_type 31
> [@-1,10:10='c',<104>,1:10]
>       Leave named_integral_type 31
>      Leave named_numeric_type 30
>     Leave named_value_type 29
>    Leave named_type 27
>   Leave type 26
>   Enter constant_declarators 52
>    Enter constant_declarator 53
> [@-1,12:12='=',<58>,1:12]
> [@-1,14:14='1',<37>,1:14]
>     Enter constant_expression 203
>      Enter expression 201
>        ...
>                       Enter literal 113
>                        Enter integer_literal 114 [@-
> 1,15:15=';',<158>,1:15]
>                        Leave integer_literal 114
>                       Leave literal 113
>        ...
>      Leave expression 201
>     Leave constant_expression 203
>    Leave constant_declarator 53
>   Leave constant_declarators 52
>  Leave constant_declaration 50
> Leave class_member_declaration 49
>
> Here is the trace for the hard case (it's unoptimized as the
> backtracking syntactic predicate [in yellow] is matching tokens past
> 'const' [in red]) :
>
> Enter class_member_declaration 49
> [@-1,0:0='[',<129>,1:0]
> [@-1,1:3='Foo',<104>,1:1]
>  Enter synpred1_CSharp_fragment 308
>   Enter constant_declaration 50
>    Enter attribute_section 292
>     Enter attribute_list 295
>      Enter attribute 296
>       Enter attribute_name 297
>        Enter type_name 15
>         Enter namespace_or_type_name 16
> [@-1,4:4=']',<28>,1:4]
>         Leave namespace_or_type_name 16
>        Leave type_name 15
>       Leave attribute_name 297
>      Leave attribute 296
>     Leave attribute_list 295
> [@-1,5:9='const',<31>,1:5]
>    Leave attribute_section 292
> [@-1,11:13='int',<108>,1:11]
>    Enter type 26
>     Enter named_type 27
>      Enter named_value_type 29
>       Enter named_numeric_type 30
>        Enter named_integral_type 31
> [@-1,15:15='c',<104>,1:15]
>        Leave named_integral_type 31
>       Leave named_numeric_type 30
>      Leave named_value_type 29
>     Leave named_type 27
>    Leave type 26
>    Enter constant_declarators 52
>     Enter constant_declarator 53
> [@-1,17:17='=',<58>,1:17]
> [@-1,19:19='1',<37>,1:19]
>      Enter constant_expression 203
>       Enter expression 201
>        ...
>                        Enter literal 113
>                         Enter integer_literal 114 [@-
> 1,20:20=';',<158>,1:20]
>                         Leave integer_literal 114
>                        Leave literal 113
>        ...
>    Leave constant_declarators 52
>   Leave constant_declaration 50
>  Leave synpred1_CSharp_fragment 308
>
> Here is the Syntactic Predicate solution (same as backtracking solution
> with the addition of syntactic predicates [yellow]):
>
> public
> unattributed_class_member_declaration
>   options { backtrack=true; }
>   : (attribute_section* constant_modifier* CONST)=>
> constant_declaration
>   | (attribute_section* event_modifier* EVENT)=> event_declaration
>
> constant_declaration
>   : attribute_section* CONST type constant_declarators SEMI_COLON
>   ;
>
> event_declaration
>   : attribute_section* EVENT type variable_declarators SEMI_COLON
>   ;
>
> attribute_section // is recursive
>   options { memoize=true; }
>   : OPEN_SQUARE_BRACKET attribute_target_specifier? attribute_list
> COMMA?
> CLOSE_SQUARE_BRACKET
>   ;
>
> Here is the trace for the hard case (now optimized as the syntactic
> predicate bails soon after it sees 'const'):
>
> Enter class_member_declaration 49
> [@-1,0:0='[',<129>,1:0]
> [@-1,1:3='Foo',<104>,1:1]
>  Enter synpred1_CSharp_fragment 308
>   Enter attribute_section 292
>    Enter attribute_list 295
>     Enter attribute 296
>      Enter attribute_name 297
>       Enter type_name 15
>        Enter namespace_or_type_name 16
> [@-1,4:4=']',<28>,1:4]
>        Leave namespace_or_type_name 16
>       Leave type_name 15
>      Leave attribute_name 297
>     Leave attribute 296
>    Leave attribute_list 295
> [@-1,5:9='const',<31>,1:5]
>   Leave attribute_section 292
> [@-1,11:13='int',<108>,1:11]
>  Leave synpred1_CSharp_fragment 308
>
> And here is the trace for the common case (now unoptimized as a DFA
> monkey could have predicted the alternative but instead the syntactic
> predicate was
> invoked):
>
> Enter class_member_declaration 49
> [@-1,0:4='const',<31>,1:0]
>  Enter synpred1_CSharp_fragment 308
> [@-1,6:8='int',<108>,1:6]
>  Leave synpred1_CSharp_fragment 308
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address


More information about the antlr-interest mailing list