[antlr-interest] Grammar generation results in exception

Jim Idle jimi at temporal-wave.com
Mon Jun 25 11:53:59 PDT 2007


The overhead is of course entirely to do with the grammar and th way you
have put it together. If the language you are parsing already has
significant ambiguity issues that can only be solved with large
predicates then the overhead difference may not be as much as when the
grammar could be expressed without too much in the way of heavy
predicates.

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Robin Davies
> Sent: Monday, June 25, 2007 10:56 AM
> To: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Grammar generation results in exception
> 
> 
> > From: "Jim Idle" <jimi at temporal-wave.com>
> > Only use backtracking and memoizing when you are producing a
> prototype
> > or you really do not care at all about speed. It makes things easy
> but
> > the best thing to do for a production grammar is get rid of the
> > ambiguities.
> Given a choice between maintainability with
> decent
> performance, and head-snapping speed, I'll take the former anytime.

However, in this case you might find your performance and
maintainability are involved in the same decision :-). When you turn on
backtracking and memorization for the whole grammar you don't
distinguish with as much precision as you can with the shortest
predicate you can come up with. In your case, the C# generic ambiguity
(nobody ever learns do they? ;-) probably is a hefty predicate, but then
is there anywhere else you need backtracking to resolve it? Can your
predicate not just be ((longrulepath)=>x) | notx ? or turn on
backtracking for that particular rule?

It may be a matter of personal opinion but the explicit predicate in the
rule seems much more maintainable than having to infer what is going on
in backtracking mode. So, I am not sure what your definition of
readability is, but to me it is quite often a case of formatting it
nicely. Predicates are part of the syntax, but leaving them out because
you don't like the look of it is only an option if you really do not
care about parsing performance. Not that ANTLR parsers are slow, but you
are obviously sacrificing a fair bit for the convenience of backtracking
- only you can decide what is worth pursuing I think :-)

> 
> One minor issue that I am aware of is that manual predicates can help
> error
> recovery: e.g.
> 
>        int aFunction() { someerrorherere }
> 
> produces "No viable alternate at "int"" with full backtrack
predicates,
> whereas a  manual predicate of
> 
> member_function_declaration:
>       (type ID '(' arglist_opt ')'  '{')=> ...
> 
> will force error reporting to address the more specific error the
> *somerrorhere* part of the parse tree. As a result, I'm already
> overriding
> backtrack=false in places.

Right, so maybe it is better to turn it off at the grammar level then
turn it back on selectively. You can of course make things a little more
readable by repdicating with rules rather than all the tokens strung out
like that.

> 
> I am facing issue with factorization, though. And I'm not clear on
> whether
> memoization helps me or not. An example:
> 
>    (ex-tempore, not production code...)
> 
> member_declaration:
> options { backtrack=true; memoize = true; }
>        :     function_declaration
>         |   property_declaration
>       .... &c.
> 
> function_declaration
>     :  attributes_opt modifiers_opt type identifier '(' arglist ')'
> block
>     ;
> property_declaration
>     :   attributes_opt modifiers_opt type identifier '{'
> get_set_declarations '}'
>     ;
> 
> Will memoization of member_declaration perform memo-ization of
> attributes_opt and modifiers_opt? There's an easy factorization that
> could
> be performed here, at considerable expense to readability: move
> attibutes_opt and modifiers_opt up to the start of the
> member_declaration
> rule.

Why not:

declarations
	: attributes_opt modifiers_opt type identifier
		(
			  '(' arglist ')' block    	// Function
declaration
			| '{' get_set_declarations	// Property
declaration
		)
	;

Which won't need to backtrack at all I should think? Perhaps you should
look at your rules and see if your ambiguities (other than the known
one) are really just a matter of rearranging them?

Jim


More information about the antlr-interest mailing list