[antlr-interest] ANTLR running out of memory during generation

Jim Idle jimi at temporal-wave.com
Sat Jan 30 11:42:12 PST 2010


Ron,

First you really need to switch off backtracking unless the objective of your parser is to analyze SQL (you gave it away when you mentioned 632 keywords that can be identifiers). There are not as many predicates required as you think so long as you left factor everything.

Your tokens should be consecutive so long as you list them that way in the lexer. 

The problem might well be that although SQL sort of allows all keywords to be identifiers, it does not allow all because some of them would be to ambiguous even for a syntax directed hand crafted parser. If you turn on backtracking then try to allow one of these reserved words to be an identifier, then you will probably mask the issue because all warnings and errors are turned off. 

It is entirely feasible to create a full SQL parser without backtracking, very little look ahead and few predicates (all of the one or two token lookahead type). I have an online demo of T-SQL for instance on my web site at www.temporal-wave.com  (select 'online demos' link), and Oracle SQL/PLSQL will be up there before long too.

So, I think you will need to do the following to have a chance of generating the code:

1) Use -Xconversiontimeout 10000
2) Cause switches to be generated rather than ifs: -Xmaxswitchcaselabels 32000 -Xminswitchalts 1-xmaxinlineddfastates 65534
3) Use -Xmx2G when invoking the java command (assuming your jvm allows that)

But if you cannot get it going that way, then basically you are masking a bigger problem in your grammar that you are not seeing because of global backtracking. 

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Ron Hunter-Duvar
> Sent: Friday, January 29, 2010 8:52 PM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] ANTLR running out of memory during generation
> 
> I'm having a strange problem with ANTLR. I'm building a grammar for a
> language with a huge number (hundreds) of non-reserved keywords. I'm
> using the approach of having the lexer return a different token type
> for
> each keyword, and then having a parser rule of the form:
> 
>     id : ( ID | QUOTED_ID | KW_A | KW_B | ... | KW_ZZZ );
> 
> This was working great until today. In fact, ANTLR 3.2 generates
> surprisingly clever code for this - all the keywords are assigned
> consecutive token numbers, and generated code just says:
> 
>     if ( (input.LA(1)>=KW_A && input.LA(1)<=KW_ZZZ)||(input.LA(1)>=ID
> &&
> input.LA(1)<=QUOTED_ID) ) {
>         input.consume();
>         ...
> 
> This works all the way up to 631 keywords. ANTLR runs in about 20
> seconds, and never uses more than 269MB of memory. When I add a 632nd
> keyword (doesn't matter what the keyword is), and change nothing else,
> ANTLR runs for 2 minutes and runs out of heap space. I kept bumping the
> max space up, but even going to 2GB doesn't make any difference.
> 
> What's really interesting is that I was using ANTLR 3.1 until now. When
> I ran into this I upgraded to 3.2, but both of them fail at exactly the
> same spot, 632 keywords. Not surprisingly, the stack trace varies from
> one run to the next, depending on the exact point it runs out of
> memory,
> but it always has deeply nested calls to these and other methods:
> 
> 
> org.antlr.stringtemplate.language.ASTExpr.writeTemplate(ASTExpr.java:75
> 0)
>     org.antlr.stringtemplate.language.ASTExpr.write(ASTExpr.java:680)
> 
> org.antlr.stringtemplate.language.ASTExpr.writeAttribute(ASTExpr.java:6
> 60)
> 
> org.antlr.stringtemplate.language.ActionEvaluator.action(ActionEvaluato
> r.java:86)
>     org.antlr.stringtemplate.language.ASTExpr.write(ASTExpr.java:149)
> 
> org.antlr.stringtemplate.StringTemplate.write(StringTemplate.java:705)
> 
> I don't know if it makes a difference, but I'm using backtracking
> (otherwise, this approach to non-reserved keywords doesn't work without
> a lot of synpreds), and outputting ASTs.
> 
> Since this is size related, it's hard to narrow it down to a simple
> example. I could try to duplicate it with just the id rule and nothing
> else.
> 
> Any ideas what might be happening here, and whether a fix might be
> possible?
> 
> Thanks,
> Ron
> 
> --
> Ron Hunter-Duvar | Software Developer V | 403-272-6580
> Oracle Service Engineering
> Gulf Canada Square 401 - 9th Avenue S.W., Calgary, AB, Canada T2P 3C5
> 
> All opinions expressed here are mine, and do not necessarily represent
> those of my employer.
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address





More information about the antlr-interest mailing list