[antlr-interest] Grammar generation results in exception

Mon Jun 25 10:55:56 PDT 2007

> From: "Jim Idle" <jimi at temporal-wave.com>
> Only use backtracking and memoizing when you are producing a prototype
> or you really do not care at all about speed. It makes things easy but
> the best thing to do for a production grammar is get rid of the
> ambiguities.

I'm curious about this. I'm approaching the point at which I need to convert 
my grammar to non-backtracking (if neccessary), and to optimize (as 
required). And I'm not entirely clear on the performance implications of 
various optimizations. Given a choice between maintainability with decent 
performance, and head-snapping speed, I'll take the former anytime.

Right now, m grammar seems to perform parsing of test cases correctly. At 
some point, very soon, I need to address the backtracking issue.

To the maximum extent possible, I'd like to preserve readability. And it's 
not really clear to me, yet, how various optimizations trade off. In this 
particular case, (C# 2.0) the grammar does have a standard-mandated 
ambiguity that requires a heavy-weight predicate, and less-than-wonderfully 
optimal factoring of recursive expression, which probably may madnate the 
use of additional predicates(e.g. the parsing rules for lvalues are 
completely different from those for rvalue).  And many of the expression 
rules recurse almost arbitrarily into creative and unique points in the 
expression ruleset (e.g 'new' expressions are parsed at different points, 
using different parsng rules depending on whether the target type is an 
array or an object type. Very strange). The current grammar is very unhappy 
without backtracking enabled. And I'm facing a fairly significant chunk of 
work to resolve the ambiguities manually.

As far as I can make out, the backtracking predicates only seem to get 
exercised when the outcome of the parse can't be predicted through LL(*). 
So, wouldn't this imply that there is -- in fact -- not a significant 
overhead to leaving backtracking on? My strong suspicion, at this point, is 
that I'm going to need heavy duty predicates through most of the expression 
parse tree, anway, and most of the expensive part of the statement parse 
tree.

This leads me to think that any backtracing that does take place will have 
to take place anyway, whether backtrack is true, or whether I use manually 
constructed predicates. So is there really a significant performance 
difference, given that LL(*) lookahead seems to be used to predict the 
correct branch in the absense of ambiguities?

One minor issue that I am aware of is that manual predicates can help error 
recovery: e.g.

       int aFunction() { someerrorherere }

produces "No viable alternate at "int"" with full backtrack predicates, 
whereas a  manual predicate of

member_function_declaration:
      (type ID '(' arglist_opt ')'  '{')=> ...

will force error reporting to address the more specific error the 
*somerrorhere* part of the parse tree. As a result, I'm already overriding 
backtrack=false in places.

I am facing issue with factorization, though. And I'm not clear on whether 
memoization helps me or not. An example:

   (ex-tempore, not production code...)

member_declaration:
options { backtrack=true; memoize = true; }
       :     function_declaration
        |   property_declaration
      .... &c.

function_declaration
    :  attributes_opt modifiers_opt type identifier '(' arglist ')' block
    ;
property_declaration
    :   attributes_opt modifiers_opt type identifier '{' 
get_set_declarations '}'
    ;

Will memoization of member_declaration perform memo-ization of 
attributes_opt and modifiers_opt? There's an easy factorization that could 
be performed here, at considerable expense to readability: move 
attibutes_opt and modifiers_opt up to the start of the member_declaration 
rule.

In the end, I'm guessing that I'm just going to have to do it and see. But 
I'm curious whether anyone's at a point that they can speculate as to the 
relative merits and performance implications of various optimizations.