[antlr-interest] ANTLRWorks 1.4.3: XYZParser.java:14: code too large (public static final String[] tokenNames = new String[] { ... } ; )

Vlad vlad at demoninsight.com
Mon Sep 26 12:36:41 PDT 2011


Java .class file format has a specification limit such that any bytecode method must not be more than 64k bytes. If this limit is violated (normally only happens with auto-generated code) you'll see the compiler error you have.

Although it is not very common knowledge, any java code of the form

... static final SomeType SOME_FIELD = <some exp>

is equivalent to the following:

a declaration of 

... static final SomeType SOME_FIELD;

combined with bytecode to compute <some exp> and assign it to SOME_FIELD, performed inside a special method named '<clinit>' at class loading time. It is the same method that also collects anything you put in a static { } block. I had a look at the .java files you see generated from your grammer and there is absolutely a ton of public static finals that require such <clinit> code: some 1300 FOLLOW_... BitSets, tokenNames, ruleNames, etc. 

All of those static field init bytecodes end up in <clinit> and cause size overflow. It seems to me that hitting the 64K limit can happen for any reasonably large grammar ("large" defined not just in terms of token count, but also the number of rules, etc). 

To address this issue at the fundamental level, ANTLR need to alter its .java code emission strategy. Perhaps map rules to static methods of static nested classes instead of lumping everything into a single .class definition. Nested classes in Java are compiled into separate .class definitions, each with it own <clinit>.

HTH,
Vlad

On Sep 26, 2011, at 1:06 PM, Udo Weik wrote:

> Hello Terence,
> 
>> Interesting. That's not that big.Only 162 strings should not merely be enough to blow out the 64k  static INIT method limit. hmm... perhaps the other arrays are the big as well.
>> ter
> 
> I just tried to delete 'static' but of course that doesn't work:
> XYZParser.java:347: non-static variable tokenNames cannot be referenced from a static context
> So the question is - any solution for that problem?
> 
> Thanks and greetings
> Udo
> 
> 
>> On Sep 26, 2011, at 8:42 AM, Udo Weik wrote:
>> 
>>> Hello Terence,
>>> 
>>>> wow! how big is that grammar?
>>>> Ter
>>> 
>>> I'm trying to get the VHDL-grammar for the CSharp-target from
>>> Mike Lodder working with Java:
>>> http://www.antlr.org/grammar/1202750770887/vhdl.g
>>> 
>>> Some first, very basic modifications see attachement.
>>> First of all that grammar should work with ANTLRWorks 1.4.3.
>>> 
>>> Many thanks for any support
>>> Udo
>>> 
>>> 
>>>> On Sep 26, 2011, at 6:50 AM, Udo Weik wrote:
>>>> 
>>>>> Hello,
>>>>> 
>>>>> the length of that line is 1647 chars (162 strings).
>>>>> The grammar is an existing one. What can/must I do?
>>>>> 
>>>>> [15:41:27] XYZParser.java:14: code too large
>>>>> [15:41:27]     public static final String[] tokenNames = new String[] {
>>>>> [15:41:27]                                  ^
>>>>> [15:41:27] 1 error
>>>>> 
>>>>> 
>>>>> Many thanks and greetings
>>>>> Udo
>>>>> 
>>>>> 
>>>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>>>>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>>> 
>>> 
>>> <vhdl__UW1a.g>
>> 
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address



More information about the antlr-interest mailing list