[antlr-interest] Gated semantic predicates - performance problem

Mon Mar 15 13:19:18 PDT 2010

2010/3/15 Marcin Rzeźnicki <marcin.rzeznicki at gmail.com>:
>
>
> On Mon, Mar 15, 2010 at 2:25 PM, Andrew Haritonkin <thikone at gmail.com>
> wrote:
>>
>> Hello everyone,
>>
>> While writing grammar for PL/SQL I have encountered a problem with
>> performance of generated code.
>>
>
> Hi Andrew,
> First of all - I am not sure why do you need predicates in this part of your
> grammar. In case you haven't got an ambiguity, and I can't spot it in your
> grammar, predicates are not needed at all. They could help if, let's say, OR
> could be unary, where parser could not by itself decide whether 'OR' in:
> condition_and OR condition_and  means binary or unary OR (assuming unary OR
> exists and you have somewhere construct that chains conditions together).
> Perhaps I am missing something so please clarify.
> On the other hand, the quality of code generated by ANTLR is very low. I
> have long stopped using generated code "as is" and perform various
> "amendments" manually, which is painful if you care for automated build
> process. It can be argued that you do not need maintainable code with parser
> generator, but I suppose that not only maintainability, but also performance
> suffers due to lousy code generation. Counter-argument that ANTLR code-gen
> is supposed to be human-readable fails miserably with invention of clever
> DFAs state encoding scheme. My favorite example is:
> if ((LA6_0 == '\t' || (LA6_0 >= ' ' && LA6_0 <= '$')
> || (LA6_0 >= '&' && LA6_0 <= '~')
> || (LA6_0 >= '\u00A0' && LA6_0 <= '\uD7FF') || (LA6_0 >= '\uDC00' && LA6_0
> <= '\uFFFF'))) {
> alt6 = 1;
> } else if (((LA6_0 >= '\uD800' && LA6_0 <= '\uDBFF'))) {
> alt6 = 2;
> } else {
> NoViableAltException nvae = new NoViableAltException(
> "",
> 6,
> 0,
> input);
> throw nvae;
> }
> which is followed, within the same method, just two lines below, by:
> if (input.LA(1) == '\t'
> || (input.LA(1) >= ' ' && input.LA(1) <= '$')
> || (input.LA(1) >= '&' && input.LA(1) <= '~')
> || (input.LA(1) >= '\u00A0' && input.LA(1) <= '\uD7FF')
> || (input.LA(1) >= '\uDC00' && input.LA(1) <= '\uFFFF')) {
> input.consume();
> }
> Let's hope that JIT can eliminate common input.LA(1) calls, but I wouldn't
> count on it.
> Parser code also suffers from mess of unread yet initialized variables which
> may significantly raise GC pressure and lower performance due to constant
> creation of unneeded objects (possibly eliminated by Escape Analysis ?).
> Also, tons of unneeded null checks, static inner classes that can be
> eliminated or merged (DFAs also rule results sometimes) add up to the
> overall picture. I really hope some of this problems will be fixed with new
> releases and I keep my fingers crossed.
>
> --
> Greetings
> Marcin Rzeźnicki
>

Sorry, there is a mistake in one rule:

condition_not
       :       ( 'NOT' )? expression_add
       ;

Predicates needed because condition rules reuse expression rules. But
for recursion in the end of expression chain I need to go to right top
level rule. Either condition top rule or expression top rule.

For example, if we parse condition call stack might look like this:

sql_condition
condition_or
condition_and
condition_not
expression_add
expression_mul
expression_sign
expression_expr
expression_par  <-- at this point we need to go to condition_or
condition_or
condition_and
...
and so on.

In case of expression call stack will be different:

sql_expression
expression_add
expression_mul
expression_sign
expression_expr
expression_par  <-- at this point we need to go to expression_add, not
condition_or!
expression_add
expression_mul
...
and so on.

As you may notice, expression_par rule calls top rule recursively. But
which top rule depends on how it started - from sql_condition or from
sql_expression. That's the place where gated semantic predicate is
needed.

Unfortunately, I don't see how to do it in efficient way. Since
predicates, even gated semantic predicates, evaluated after prediction
(predicates embedded in DFA as last conditions for each possible
path).

Is it possible to shut down lookahead completely (k = 0) and just take
first path for which predicate is evaluated as true?

Andrew