[antlr-interest] Parser performance dropping as a function of line count

Mon Oct 2 00:39:35 PDT 2006

Hi,

On 10/2/06, Rukmal Fernando <rukmal_fernando at yahoo.com> wrote:
> After trying a few PL\SQL grammar files which did not meed our particular needs, we decided to write a new PL\SQL parser from scratch, comprising of a subset of PLSQL features specific to our work. The (Lexer + Parser) grammar is only 670 lines with only one syntactic predicate.
>
> The problem we have is that we have PL\SQL files of around 18K lines of code, consisting of a PL\SQL package with various procedures and functions. We now have some serious performance problems with this.
>
> As an example, we have a parser error generated at line 460. When the last bit of the file is truncated to bring the file size to 1K lines, the parser takes about 15-16 seconds to reach the error. When the file is truncated to about 2K lines, it takes 29 seconds to reach the error. Likewise, when the file is truncated to 3K and 5K lines, it take rougly 90 and 150 seconds respecitvely to reach the 460th line.

Did you try generating the parser with -traceParser and see what is
exactly happening? The times you list more or less hint at that the
syntactic predicate is in a pretty bad place. Looking at the output of
-traceParser will tell you wether that is happening. I'm not 100% sure
wether the default trace behaviour shows wether the parser is
backtracking, think it did, else you have to override the traceXX
methods.

Aside note: Did you check wether the lexer is the slowing factor?
ANTLR2's lexers are not really performance animals. You can easily
check this by making a loop that calls the lexer's nextToken() method
repeatedly.

Cheers,

Ric