[antlr-interest] ANTLR running out of memory during generation

Ron Hunter-Duvar ron.hunter-duvar at oracle.com
Fri Jan 29 20:51:40 PST 2010


I'm having a strange problem with ANTLR. I'm building a grammar for a 
language with a huge number (hundreds) of non-reserved keywords. I'm 
using the approach of having the lexer return a different token type for 
each keyword, and then having a parser rule of the form:

    id : ( ID | QUOTED_ID | KW_A | KW_B | ... | KW_ZZZ );

This was working great until today. In fact, ANTLR 3.2 generates 
surprisingly clever code for this - all the keywords are assigned 
consecutive token numbers, and generated code just says:

    if ( (input.LA(1)>=KW_A && input.LA(1)<=KW_ZZZ)||(input.LA(1)>=ID && 
input.LA(1)<=QUOTED_ID) ) {
        input.consume();
        ...

This works all the way up to 631 keywords. ANTLR runs in about 20 
seconds, and never uses more than 269MB of memory. When I add a 632nd 
keyword (doesn't matter what the keyword is), and change nothing else, 
ANTLR runs for 2 minutes and runs out of heap space. I kept bumping the 
max space up, but even going to 2GB doesn't make any difference.

What's really interesting is that I was using ANTLR 3.1 until now. When 
I ran into this I upgraded to 3.2, but both of them fail at exactly the 
same spot, 632 keywords. Not surprisingly, the stack trace varies from 
one run to the next, depending on the exact point it runs out of memory, 
but it always has deeply nested calls to these and other methods:

    
org.antlr.stringtemplate.language.ASTExpr.writeTemplate(ASTExpr.java:750)
    org.antlr.stringtemplate.language.ASTExpr.write(ASTExpr.java:680)
    
org.antlr.stringtemplate.language.ASTExpr.writeAttribute(ASTExpr.java:660)
    
org.antlr.stringtemplate.language.ActionEvaluator.action(ActionEvaluator.java:86)
    org.antlr.stringtemplate.language.ASTExpr.write(ASTExpr.java:149)
    org.antlr.stringtemplate.StringTemplate.write(StringTemplate.java:705)

I don't know if it makes a difference, but I'm using backtracking 
(otherwise, this approach to non-reserved keywords doesn't work without 
a lot of synpreds), and outputting ASTs.

Since this is size related, it's hard to narrow it down to a simple 
example. I could try to duplicate it with just the id rule and nothing else.

Any ideas what might be happening here, and whether a fix might be possible?

Thanks,
Ron

-- 
Ron Hunter-Duvar | Software Developer V | 403-272-6580
Oracle Service Engineering
Gulf Canada Square 401 - 9th Avenue S.W., Calgary, AB, Canada T2P 3C5

All opinions expressed here are mine, and do not necessarily represent
those of my employer.



More information about the antlr-interest mailing list