[antlr-interest] Parser performance dropping as a function of line count

Rukmal Fernando rukmal_fernando at yahoo.com
Sun Oct 1 21:45:32 PDT 2006


Dear all, 

First of all, we are somewhat new to ANTLR and have not had any formal education on Parser construction. Secondly, we apologise for the long e-mail, but we would like to give you the complete picture wtih regard to our unusual problem.

We are a Sri Lankan software team developing a code diff / merge tool, and are creating a new version using .Net 2 and ANTLR 2.7.6 to replace the older Flex/ Bison / VC++ tool.

After trying a few PL\SQL grammar files which did not meed our particular needs, we decided to write a new PL\SQL parser from scratch, comprising of a subset of PLSQL features specific to our work. The (Lexer + Parser) grammar is only 670 lines with only one syntactic predicate.

The problem we have is that we have PL\SQL files of around 18K lines of code, consisting of a PL\SQL package with various procedures and functions. We now have some serious performance problems with this. 

As an example, we have a parser error generated at line 460. When the last bit of the file is truncated to bring the file size to 1K lines, the parser takes about 15-16 seconds to reach the error. When the file is truncated to about 2K lines, it takes 29 seconds to reach the error. Likewise, when the file is truncated to 3K and 5K lines, it take rougly 90 and 150 seconds respecitvely to reach the 460th line.

Just for clarification, what we mean by "truncating the file" is to remove procedures etc from the tail end so that the file is still syntactically valid. We are absolutely stumped by how the parser seems to take roughly n-times as much time to reach the same point of the file, when the file's line count is n*1000 lines of code.

We would greatly appreciate if anyone can point out
a. Whether such a drop in performance can be explained as a result of ANTLR itself, or the grammar file
b. Whether this can be fixed, and if so your suggestions.

Thank you in advance.

Rukmal Fernando.
IFS R&D, Sri Lanka.
rukmal UNDERSCORE fernando AT yahoo DOT com

p.s: Performance stats
File length (lines)   Time to reach line 460 (seconds)
1040                     15-16 [stopwatch on error message! :-) ]
2042                     29
3032                     41.5
5057                     70
10087                    153

Estimated time to parse complete file is below.
File line count           Estimated time to parse (seconds)
1040                        37

2042                        128

3032                        272

5057                        767

10087                      3347








More information about the antlr-interest mailing list