[antlr-interest] Memory requirements of C runtime when backtracking

Ivan Brezina ibre5041 at ibrezina.net
Fri May 6 08:10:00 PDT 2011


You're right the grammar has very large DFAs. That was the second  
problem I'm facing. In
general Oracle's SQL distinguishes reserved words and keywords.

In order to satisfy backward compatibility many newly added words(keywords)
have special meaning and but also can be used as identifiers.
for example such a query is legal:
Select * from join JOIN join using(join);

The uppercase JOIN is a keyword while other "join"s are identifiers.
In my grammar the rule identifier is defined as
identifier
   : ID //(lexer rule for an identifier)
   | 'A' //(1st keyword)
   | 'AS' //(2dn keyword)
   ...
   | '' // (200th keyword)
   ;

I'm afraid that this is one the reasons why the tables are so huge.
Are there any ways how to deal with something like that?

The previous version of this grammar used gated semantic predicates like:

k_join : { LA(1).gettext() == "JOIN" }? => ID

is that the right way to reduce the size of DFA?

Ivan

Quoting Jim Idle <jimi at temporal-wave.com>:

> You should not be using backtrack=true if you are short on memory, but
> without seeing your grammar I cannot comment on the ram usage. It might be
> that your grammar causes the generation of huge DFA tables. Backtracking
> itself does not cost lots of memory though.
>
> jim
>
>> -----Original Message-----
>> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
>> bounces at antlr.org] On Behalf Of Ivan Brezina
>> Sent: Wednesday, May 04, 2011 5:20 AM
>> To: antlr-interest Interest
>> Subject: [antlr-interest] Memory requirements of C runtime when
>> backtracking
>>
>> Hi all
>> doing some unit test of Oracle SQL grammar I'm facing problems with
>> memory requirements when parsing input having many parenthesis.
>>
>> In general the grammar can have three types of statements which can be
>> enclosed in parens.
>>
>> 1. value expression like (A),(1+2).
>> For an sql_expression there is set of rules using numerical operator
>> sql_expression -> expr_add -> expr_mul -> expr_sign -> expr_pow ->
>> expr_paren
>>
>> 2. logical expression like (A > 2) or (A is null) For an condition
>> expression there is a set of rules using locical operator sql_condition
>> -> condition_or -> condition_and -> condition_not -> condition_paren
>>
>> 3. a subquery. It is an sql statement enclosed in parenthesis like
>> (SELECT * FROM dual).
>>
>> These 3 types can by combined in many ways.
>> For example ((SELECT count(1) from dual) > 1)
>>
>> While testing if found queries - probably generated by some evil sick
>> robot - whose require more than 8B of RAM to parse even they are quite
>> short
>>
>> For example:
>> SELECT *
>> FROM TABLE1, TABLE2
>> WHERE
>> (
>>   (
>>    (
>>     (
>>      (
>>       (
>>        (
>>         (
>>          (
>>   	 (
>>   	  (
>>   	   (
>>   	    (
>>   	     (
>>   	      (
>> 	       ( TABLE2.DT = '2' ) OR ( TABLE2.DT = '3' )
>>    	      ) AND ( TABLE2.CODE < 9 )
>>    	     ) AND ( TABLE2.WH = 'XXX' )
>>    	    ) AND ( TABLE1.ID = '0000001' )
>>    	   ) AND ( ( TABLE1.ATTR_1 IS NULL ) OR ( TABLE1._ATTR1 = '*' ) )
>>    	  ) AND ( ( TABLE1.ATTR_2 IS NULL ) OR ( TABLE1._ATTR2 = '*' ) )
>>    	 ) AND ( ( TABLE1.ATTR_3 IS NULL ) OR ( TABLE1._ATTR3 = '*' ) )
>>       	) AND ( ( TABLE1.ATTR_4 IS NULL ) OR ( TABLE1._ATTR4 = '*'
> )
>> )
>>         ) AND ( ( TABLE1.ATTR_5 IS NULL ) OR ( TABLE1._ATTR5 = '*' ) )
>>        ) AND ( ( TABLE2.TYPE IS NULL ) OR ( TABLE2.TYPE = '*' ) )
>>       ) AND ( ( TABLE2.NBR IS NULL ) OR ( TABLE2.NBR = '*' ) )
>>      ) AND ( ( TABLE2.STAT = '01' ) OR ( TABLE2.STAT = '*' ) )
>>     ) AND ( ( TABLE2.ORGN IS NULL ) OR ( TABLE2.ORGN = '*' ) )
>>    ) AND ( TABLE2.NBR = '00000000' )
>>   ) AND ( TABLE2.PO IS NULL )
>> )
>>
>> Both value expression and condition expression rules do backtracking.
>> In the example above every parenthesis nesting doubles memory
>> requirements.
>>
>> Are there any ways how to reduce/monitor memory requirements?
>> What exactly is remembered when backtracking?
>> I tried to add some syntactic predicates into value/conditional
>> expression but it usually led to failure of other tests.
>>
>> thx Ivan
>>
>>
>>
>>
>> ----------------------------------------------------------------
>> This message was sent using IMP, the Internet Messaging Program.
>>
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
>> email-address
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:   
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>



----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: OracleSQL.zip
Type: application/zip
Size: 35052 bytes
Desc: not available
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20110506/af4ae21e/attachment.zip 


More information about the antlr-interest mailing list