[antlr-interest] Progressive Slowdown in Parsing

Mon Dec 22 02:54:45 PST 2008

Yes,  I have the option backtracking turned on on the grammar level.

My test with syntactic predicates based on tokens like below, so they were 
not very complicated, but they resulted in no performance gain.

Here is the start of my grammar:

grammar VisionForm;

options {
    	language=CSharp2;
	backtrack=true;
//	memoize=true;
	rewrite=true;
	output=template;
}

tokens {
	ACCEPT			=	'ACCEPT';
	ACTION			=	'ACTION';
	ADD			=	'ADD';
	AFTER			= 	'AFTER';
	ALL			=	'ALL';
	AMOUNT		=	'AMOUNT';
	AND			=	'AND';
	ARE			=	'ARE';
	AS			=	'AS';
	ASC			=	'ASC';
	AUD_ACTION		=	'AUD_ACTION';
	AUD_ON_ENTRY		=	'AUD_ON_ENTRY';
	AUTO_COMMIT		=	'AUTO_COMMIT';
	AUTO_ZOOM		=	'AUTO_ZOOM';
	BACKGROUND		=	'BACKGROUND';
	BEFORE			=	'BEFORE';
	BEGIN			=	'BEGIN';
	BEGIN_SQL		=	'BEGIN_SQL';
	BETWEEN		=	'BETWEEN';
	BINARY			=	'BINARY';

The grammar itself is quite long (1800 lines), because it parses 4GL and 
(partially) SQL code.

I can post it, if it helps.

Thanks for your help !

Andreas

--------------------------------------------------
From: "Gavin Lambert" <antlr at mirality.co.nz>
Sent: Monday, December 22, 2008 11:21 AM
To: "A. Saake" <asaake at hotmail.de>; <antlr-interest at antlr.org>
Subject: Re: [antlr-interest] Progressive Slowdown in Parsing

> At 22:54 22/12/2008, A. Saake wrote:
>>The first 2000 lines can be processed in under 1 min. If I parse the whole 
>>script, time increases to 15 minutes. For a correct migration of this 
>>script (it's an include file), I would have to embed it into another 3500 
>>lines code script, and I'm afraid that it will need a very long time. 
>>Because of the variable system of the 4GL (declaration of variables is not 
>>necessary, so there's no scope, I have to estimate it from context), I 
>>will have to run it many times.
>>
>>To find out, if the slowdown is from my grammar, I tried lot's of 
>>syntactic predicates and so on, until I used a profiler, which names 
>>GetKindOfOps as responsible for nearly 80% of the runtime.
>>
>>My grammar is a combined lexer and parser, output is template and I use 
>>the token rewrite mechanism.
>
> Do you use backtracking at all, or large syntactic predicates?  Either 
> will reduce the parsing speed of a grammar.
>
> Additionally, the smaller your tokens are the more work the parser is 
> typically required to do (so single-character tokens are generally a bad 
> idea).
>
> It's hard to be more specific without seeing your grammar, though.
>
>