[antlr-interest] Gated semantic predicates - performance problem

Marcin Rzeźnicki marcin.rzeznicki at gmail.com
Mon Mar 15 08:46:02 PDT 2010


On Mon, Mar 15, 2010 at 2:25 PM, Andrew Haritonkin <thikone at gmail.com>wrote:

> Hello everyone,
>
> While writing grammar for PL/SQL I have encountered a problem with
> performance of generated code.
>
>
Hi Andrew,
First of all - I am not sure why do you need predicates in this part of your
grammar. In case you haven't got an ambiguity, and I can't spot it in your
grammar, predicates are not needed at all. They could help if, let's say, OR
could be unary, where parser could not by itself decide whether 'OR' in:
condition_and OR condition_and  means binary or unary OR (assuming unary OR
exists and you have somewhere construct that chains conditions together).
Perhaps I am missing something so please clarify.

On the other hand, the quality of code generated by ANTLR is very low. I
have long stopped using generated code "as is" and perform various
"amendments" manually, which is painful if you care for automated build
process. It can be argued that you do not need maintainable code with parser
generator, but I suppose that not only maintainability, but also performance
suffers due to lousy code generation. Counter-argument that ANTLR code-gen
is supposed to be human-readable fails miserably with invention of clever
DFAs state encoding scheme. My favorite example is:

if ((LA6_0 == '\t' || (LA6_0 >= ' ' && LA6_0 <= '$')
|| (LA6_0 >= '&' && LA6_0 <= '~')
|| (LA6_0 >= '\u00A0' && LA6_0 <= '\uD7FF') || (LA6_0 >= '\uDC00' && LA6_0
<= '\uFFFF'))) {
alt6 = 1;
} else if (((LA6_0 >= '\uD800' && LA6_0 <= '\uDBFF'))) {
alt6 = 2;
} else {
NoViableAltException nvae = new NoViableAltException(
"",
6,
0,
input);

throw nvae;
}

which is followed, within the same method, just two lines below, by:

if (input.LA(1) == '\t'
|| (input.LA(1) >= ' ' && input.LA(1) <= '$')
|| (input.LA(1) >= '&' && input.LA(1) <= '~')
|| (input.LA(1) >= '\u00A0' && input.LA(1) <= '\uD7FF')
|| (input.LA(1) >= '\uDC00' && input.LA(1) <= '\uFFFF')) {
input.consume();

}

Let's hope that JIT can eliminate common input.LA(1) calls, but I wouldn't
count on it.
Parser code also suffers from mess of unread yet initialized variables which
may significantly raise GC pressure and lower performance due to constant
creation of unneeded objects (possibly eliminated by Escape Analysis ?).
Also, tons of unneeded null checks, static inner classes that can be
eliminated or merged (DFAs also rule results sometimes) add up to the
overall picture. I really hope some of this problems will be fixed with new
releases and I keep my fingers crossed.


-- 
Greetings
Marcin Rzeźnicki


More information about the antlr-interest mailing list