[antlr-interest] Progressive Slowdown in Parsing

Johannes Luber JALuber at gmx.de
Mon Dec 22 05:17:52 PST 2008


> Yes,  I have the option backtracking turned on on the grammar level.
> 
> My test with syntactic predicates based on tokens like below, so they were
> not very complicated, but they resulted in no performance gain.
> 
> Here is the start of my grammar:
> 
> grammar VisionForm;
> 
> options {
>     	language=CSharp2;
> 	backtrack=true;
> //	memoize=true;
> 	rewrite=true;
> 	output=template;
> }

There's already one problem - memoize is turned off. Without it, backtracking is far slower than it has to be, as memoizing is nothing but caching of earlier results. Of course, the trade-off is higher memory consumption, but I doubt that it will be a real problem. Possibly <http://antlr.org/wiki/display/ANTLR3/How+to+remove+global+backtracking+from+your+grammar> helps. 

Regarding your question in your other email what GetKindOfOps does: I've never used TokenRewriteStream, so I've never looked into that method. But considering your translation example I take it that 4GL requires a full tree. If you have problems with your AST, you may need several passes. The first pass to create a rudimentary AST and collect required information, a second pass to make new AST ith using the collected information, then use templates for output. Possibly more than one transformation pass is required. Doing things in passes simplifies the design in general.

Johannes
> 
> tokens {
> 	ACCEPT			=	'ACCEPT';
> 	ACTION			=	'ACTION';
> 	ADD			=	'ADD';
> 	AFTER			= 	'AFTER';
> 	ALL			=	'ALL';
> 	AMOUNT		=	'AMOUNT';
> 	AND			=	'AND';
> 	ARE			=	'ARE';
> 	AS			=	'AS';
> 	ASC			=	'ASC';
> 	AUD_ACTION		=	'AUD_ACTION';
> 	AUD_ON_ENTRY		=	'AUD_ON_ENTRY';
> 	AUTO_COMMIT		=	'AUTO_COMMIT';
> 	AUTO_ZOOM		=	'AUTO_ZOOM';
> 	BACKGROUND		=	'BACKGROUND';
> 	BEFORE			=	'BEFORE';
> 	BEGIN			=	'BEGIN';
> 	BEGIN_SQL		=	'BEGIN_SQL';
> 	BETWEEN		=	'BETWEEN';
> 	BINARY			=	'BINARY';
> 
> 
> The grammar itself is quite long (1800 lines), because it parses 4GL and 
> (partially) SQL code.
> 
> I can post it, if it helps.
> 
> Thanks for your help !
> 
> Andreas
> 
> --------------------------------------------------
> From: "Gavin Lambert" <antlr at mirality.co.nz>
> Sent: Monday, December 22, 2008 11:21 AM
> To: "A. Saake" <asaake at hotmail.de>; <antlr-interest at antlr.org>
> Subject: Re: [antlr-interest] Progressive Slowdown in Parsing
> 
> > At 22:54 22/12/2008, A. Saake wrote:
> >>The first 2000 lines can be processed in under 1 min. If I parse the
> whole 
> >>script, time increases to 15 minutes. For a correct migration of this 
> >>script (it's an include file), I would have to embed it into another
> 3500 
> >>lines code script, and I'm afraid that it will need a very long time. 
> >>Because of the variable system of the 4GL (declaration of variables is
> not 
> >>necessary, so there's no scope, I have to estimate it from context), I 
> >>will have to run it many times.
> >>
> >>To find out, if the slowdown is from my grammar, I tried lot's of 
> >>syntactic predicates and so on, until I used a profiler, which names 
> >>GetKindOfOps as responsible for nearly 80% of the runtime.
> >>
> >>My grammar is a combined lexer and parser, output is template and I use 
> >>the token rewrite mechanism.
> >
> > Do you use backtracking at all, or large syntactic predicates?  Either 
> > will reduce the parsing speed of a grammar.
> >
> > Additionally, the smaller your tokens are the more work the parser is 
> > typically required to do (so single-character tokens are generally a bad
> > idea).
> >
> > It's hard to be more specific without seeing your grammar, though.
> >
> > 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger


More information about the antlr-interest mailing list