[antlr-interest] Faster expression parsing

Thu Aug 28 13:18:07 PDT 2008

The *giant* advantage in the code example I showed lies in something
behind the scenes: no extra syntactic or semantic predicates were added,
so the prediction runs as fast as it did per-level before. On a grammar
that doesn't require any predicates at the binary expression level, this
means a 5:1 performance difference for the function. I just checked, and
26.96% of the parse time is in 'binary_expression', which is
'prefix_expression' separated by 'binary_operator'. 25.41% of the parse
time is in 'prefix_expression', so the overhead of parsing all 25
non-assignment binary operations in 12 precedence groups in my program
is now 1.5% of the parse time.

Also, the functions build the tree properly accounting for both operator
associativity and precedence groups. The profiler showed the tree
construction overhead (the call to createPrecedenceTree) given a flat
input was 0.27% of the parse time. Keep in mind that my grammar doesn't
use backtracking or semantic predicates, and is <=LL(2) for most
predictions. On a grammar that spends more time in those sections, the
overhead would show as near 0.

In the grammar I'm performance testing, I don't have any lines that look
quite like the one you mentioned. The language I'm parsing allows some
reserved words to be used as variable names, so I had this:

a	: 	literal
		-> ^(literal)
	|	varName
		-> ^(varName)
	|
	...
	;

varName	:	IDENTIFIER
	|	VAR
	|	FUNCTION
	...
	;

So I changed rule a to be:

a	: 	literal
		-> literal
	|	varName
		-> varName
	|	...
	;

'primary_expression' dropped from 7.02% to 6.40%.

It really shows the power of ANTLR3 to sandwich a operator precedence
parser in it that still allows a grammar-defined primary non-terminal,
without any changes to the generator or runtime.

My parser targets the C# runtime and currently parses 1.33MB of source
code / second, so with "only" 2500 source files (8MB) in the project,
it's getting difficult to profile with absolute consistency.

Sam

-----Original Message-----
From: Terence Parr [mailto:parrt at cs.usfca.edu] 
Sent: Thursday, August 28, 2008 1:08 PM
To: Sam Harwell
Cc: antlr-interest
Subject: Re: [antlr-interest] Faster expression parsing

Heh, really cool!  This is a similar approach to:

http://www.antlr.org/wiki/display/~admin/2008/03/23/Faster+expression 
+parsing+for+ANTLR

except that you do the appropriate tree construction and after parsing  
without consideration of operator precedence. Interesting. I wonder  
which is faster.

Note that you can improve the speed slightly when you are grammar by  
removing unnecessary tree stuff.

> primary_expression
>         :       INTEGER
>                 -> ^(INTEGER)
>         |       '(' expression ')'
>                 -> expression
>         ;

primary_expression
         :       INTEGER
         |       '(' expression ')' -> expression
         ;

Let me know if that makes a difference.

Ter