[antlr-interest] Memory requirements of C runtime when backtracking

Jim Idle jimi at temporal-wave.com
Wed May 4 08:16:10 PDT 2011


You should not be using backtrack=true if you are short on memory, but
without seeing your grammar I cannot comment on the ram usage. It might be
that your grammar causes the generation of huge DFA tables. Backtracking
itself does not cost lots of memory though.

jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Ivan Brezina
> Sent: Wednesday, May 04, 2011 5:20 AM
> To: antlr-interest Interest
> Subject: [antlr-interest] Memory requirements of C runtime when
> backtracking
>
> Hi all
> doing some unit test of Oracle SQL grammar I'm facing problems with
> memory requirements when parsing input having many parenthesis.
>
> In general the grammar can have three types of statements which can be
> enclosed in parens.
>
> 1. value expression like (A),(1+2).
> For an sql_expression there is set of rules using numerical operator
> sql_expression -> expr_add -> expr_mul -> expr_sign -> expr_pow ->
> expr_paren
>
> 2. logical expression like (A > 2) or (A is null) For an condition
> expression there is a set of rules using locical operator sql_condition
> -> condition_or -> condition_and -> condition_not -> condition_paren
>
> 3. a subquery. It is an sql statement enclosed in parenthesis like
> (SELECT * FROM dual).
>
> These 3 types can by combined in many ways.
> For example ((SELECT count(1) from dual) > 1)
>
> While testing if found queries - probably generated by some evil sick
> robot - whose require more than 8B of RAM to parse even they are quite
> short
>
> For example:
> SELECT *
> FROM TABLE1, TABLE2
> WHERE
> (
>   (
>    (
>     (
>      (
>       (
>        (
>         (
>          (
>   	 (
>   	  (
>   	   (
>   	    (
>   	     (
>   	      (
> 	       ( TABLE2.DT = '2' ) OR ( TABLE2.DT = '3' )
>    	      ) AND ( TABLE2.CODE < 9 )
>    	     ) AND ( TABLE2.WH = 'XXX' )
>    	    ) AND ( TABLE1.ID = '0000001' )
>    	   ) AND ( ( TABLE1.ATTR_1 IS NULL ) OR ( TABLE1._ATTR1 = '*' ) )
>    	  ) AND ( ( TABLE1.ATTR_2 IS NULL ) OR ( TABLE1._ATTR2 = '*' ) )
>    	 ) AND ( ( TABLE1.ATTR_3 IS NULL ) OR ( TABLE1._ATTR3 = '*' ) )
>       	) AND ( ( TABLE1.ATTR_4 IS NULL ) OR ( TABLE1._ATTR4 = '*'
)
> )
>         ) AND ( ( TABLE1.ATTR_5 IS NULL ) OR ( TABLE1._ATTR5 = '*' ) )
>        ) AND ( ( TABLE2.TYPE IS NULL ) OR ( TABLE2.TYPE = '*' ) )
>       ) AND ( ( TABLE2.NBR IS NULL ) OR ( TABLE2.NBR = '*' ) )
>      ) AND ( ( TABLE2.STAT = '01' ) OR ( TABLE2.STAT = '*' ) )
>     ) AND ( ( TABLE2.ORGN IS NULL ) OR ( TABLE2.ORGN = '*' ) )
>    ) AND ( TABLE2.NBR = '00000000' )
>   ) AND ( TABLE2.PO IS NULL )
> )
>
> Both value expression and condition expression rules do backtracking.
> In the example above every parenthesis nesting doubles memory
> requirements.
>
> Are there any ways how to reduce/monitor memory requirements?
> What exactly is remembered when backtracking?
> I tried to add some syntactic predicates into value/conditional
> expression but it usually led to failure of other tests.
>
> thx Ivan
>
>
>
>
> ----------------------------------------------------------------
> This message was sent using IMP, the Internet Messaging Program.
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address


More information about the antlr-interest mailing list