[antlr-interest] ANTLR running out of memory during generation
Ron Hunter-Duvar
ron.hunter-duvar at oracle.com
Fri Jan 29 20:51:40 PST 2010
I'm having a strange problem with ANTLR. I'm building a grammar for a
language with a huge number (hundreds) of non-reserved keywords. I'm
using the approach of having the lexer return a different token type for
each keyword, and then having a parser rule of the form:
id : ( ID | QUOTED_ID | KW_A | KW_B | ... | KW_ZZZ );
This was working great until today. In fact, ANTLR 3.2 generates
surprisingly clever code for this - all the keywords are assigned
consecutive token numbers, and generated code just says:
if ( (input.LA(1)>=KW_A && input.LA(1)<=KW_ZZZ)||(input.LA(1)>=ID &&
input.LA(1)<=QUOTED_ID) ) {
input.consume();
...
This works all the way up to 631 keywords. ANTLR runs in about 20
seconds, and never uses more than 269MB of memory. When I add a 632nd
keyword (doesn't matter what the keyword is), and change nothing else,
ANTLR runs for 2 minutes and runs out of heap space. I kept bumping the
max space up, but even going to 2GB doesn't make any difference.
What's really interesting is that I was using ANTLR 3.1 until now. When
I ran into this I upgraded to 3.2, but both of them fail at exactly the
same spot, 632 keywords. Not surprisingly, the stack trace varies from
one run to the next, depending on the exact point it runs out of memory,
but it always has deeply nested calls to these and other methods:
org.antlr.stringtemplate.language.ASTExpr.writeTemplate(ASTExpr.java:750)
org.antlr.stringtemplate.language.ASTExpr.write(ASTExpr.java:680)
org.antlr.stringtemplate.language.ASTExpr.writeAttribute(ASTExpr.java:660)
org.antlr.stringtemplate.language.ActionEvaluator.action(ActionEvaluator.java:86)
org.antlr.stringtemplate.language.ASTExpr.write(ASTExpr.java:149)
org.antlr.stringtemplate.StringTemplate.write(StringTemplate.java:705)
I don't know if it makes a difference, but I'm using backtracking
(otherwise, this approach to non-reserved keywords doesn't work without
a lot of synpreds), and outputting ASTs.
Since this is size related, it's hard to narrow it down to a simple
example. I could try to duplicate it with just the id rule and nothing else.
Any ideas what might be happening here, and whether a fix might be possible?
Thanks,
Ron
--
Ron Hunter-Duvar | Software Developer V | 403-272-6580
Oracle Service Engineering
Gulf Canada Square 401 - 9th Avenue S.W., Calgary, AB, Canada T2P 3C5
All opinions expressed here are mine, and do not necessarily represent
those of my employer.
More information about the antlr-interest
mailing list