[antlr-interest] Generated parser class too large to compile

Ron Hunter-Duvar ron.hunter-duvar at oracle.com
Thu Feb 25 14:35:29 PST 2010


Hi,

I'm running into a problem with the Java parser class generated by Antlr 
3.2 being too large to compile. I don't think there's anything wrong 
with my grammar or with Antlr, it's simply the size and complexity of 
the grammar. It's already 2,500 lines of code, 208 rules, and Antlr 
generates 68,000 lines of output. This is just the parser grammar (the 
lexer grammar is separate and isn't a problem), and I'm not done yet. 
The problem is that Java is not an ideal language target for code 
generation, given it's 64KB of bytecode per class limit (and various 
other 64K limits), due to the JVM using 16 bit pointers 
(http://java.sun.com/docs/books/jvms/second_edition/html/ClassFile.doc.html#88659). 


I've been able to work around the problem with a poor man's refactoring, 
a Perl script that breaks out the one generated class into interfaces 
for the constants (tokens, DFA initializations) and an abstract 
superclass for the DFA nested classes and methods and stubs for all the 
other methods. This is working, but as I continue I have to keep 
refining it to do more refactoring. It's really a kludge, and only works 
by relying on the specific structure and formatting of the Antlr output.

I'm thinking that a more general solution would be to modify the code 
generation to generate factored code. I've only looked briefly at it so 
far, but since it's all driven by StringTemplate templates and already 
accomodates multiple output languages, it shouldn't be too difficult to 
adapt it. I would probably create a new back-end "language" such as 
"FactoredJava", based on the Java templates. That would make switching 
between the standard one and mine a simple grammar option change. Does 
anyone see a problem with this plan? Any suggestions?

The only other alternative I see is to switch to a back-end language 
that doesn't have this limitation. But that creates quite a bit of 
rework (replacing semantic predicates and action code, and the 
subclasses of standard Antlr runtime classes that I've created to 
customize the behaviour), as well as integration issues with all the 
other Java code.

Is there anything I'm missing here? Any Antlr options that would 
significantly reduce the size of the generated code?

Thanks,
Ron

-- 
Ron Hunter-Duvar | Software Developer V | 403-272-6580
Oracle Service Engineering
Gulf Canada Square 401 - 9th Avenue S.W., Calgary, AB, Canada T2P 3C5

All opinions expressed here are mine, and do not necessarily represent
those of my employer.



More information about the antlr-interest mailing list