[antlr-interest] Memory requirements of C runtime when backtracking

Ivan Brezina ibre5041 at ibrezina.net
Wed May 4 05:20:06 PDT 2011


Hi all
doing some unit test of Oracle SQL grammar I'm facing problems with  
memory requirements when parsing input having many parenthesis.

In general the grammar can have three types of statements which can be  
enclosed in parens.

1. value expression like (A),(1+2).
For an sql_expression there is set of rules using numerical operator
sql_expression -> expr_add -> expr_mul -> expr_sign -> expr_pow -> expr_paren

2. logical expression like (A > 2) or (A is null)
For an condition expression there is a set of rules using locical operator
sql_condition -> condition_or -> condition_and -> condition_not ->  
condition_paren

3. a subquery. It is an sql statement enclosed in parenthesis
like (SELECT * FROM dual).

These 3 types can by combined in many ways.
For example ((SELECT count(1) from dual) > 1)

While testing if found queries - probably generated by some evil sick robot -
whose require more than 8B of RAM to parse even they are quite short

For example:
SELECT *
FROM TABLE1, TABLE2
WHERE
(
  (
   (
    (
     (
      (
       (
        (
         (
  	 (
  	  (
  	   (
  	    (
  	     (
  	      (
	       ( TABLE2.DT = '2' ) OR ( TABLE2.DT = '3' )
   	      ) AND ( TABLE2.CODE < 9 )
   	     ) AND ( TABLE2.WH = 'XXX' )
   	    ) AND ( TABLE1.ID = '0000001' )
   	   ) AND ( ( TABLE1.ATTR_1 IS NULL ) OR ( TABLE1._ATTR1 = '*' ) )
   	  ) AND ( ( TABLE1.ATTR_2 IS NULL ) OR ( TABLE1._ATTR2 = '*' ) )
   	 ) AND ( ( TABLE1.ATTR_3 IS NULL ) OR ( TABLE1._ATTR3 = '*' ) )
      	) AND ( ( TABLE1.ATTR_4 IS NULL ) OR ( TABLE1._ATTR4 = '*' ) )
        ) AND ( ( TABLE1.ATTR_5 IS NULL ) OR ( TABLE1._ATTR5 = '*' ) )
       ) AND ( ( TABLE2.TYPE IS NULL ) OR ( TABLE2.TYPE = '*' ) )
      ) AND ( ( TABLE2.NBR IS NULL ) OR ( TABLE2.NBR = '*' ) )
     ) AND ( ( TABLE2.STAT = '01' ) OR ( TABLE2.STAT = '*' ) )
    ) AND ( ( TABLE2.ORGN IS NULL ) OR ( TABLE2.ORGN = '*' ) )
   ) AND ( TABLE2.NBR = '00000000' )
  ) AND ( TABLE2.PO IS NULL )
)

Both value expression and condition expression rules do backtracking.
In the example above every parenthesis nesting doubles memory requirements.

Are there any ways how to reduce/monitor memory requirements?
What exactly is remembered when backtracking?
I tried to add some syntactic predicates into value/conditional expression
but it usually led to failure of other tests.

thx Ivan




----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.



More information about the antlr-interest mailing list