[antlr-interest] problem about "the code for the static initializer is exceeding the 65535 bytes limit"

Jim Idle jimi at temporal-wave.com
Wed Aug 15 20:25:19 PDT 2012


Thanks Kyle. 

BTW guys, you might not want to publish your grammars to the world, but       If you want to send them to me privately I will give you a few pointers for free. I have even been known to accept paid gigs, though that does not seem to have happened for a while in this economy ;$

Jim

On Aug 15, 2012, at 6:04 PM, Kyle Ferrio <kferrio at gmail.com> wrote:

> Hi Zhaohui,
> 
> You already know that you've discovered a theme which evokes some passion
> in the ANTLR community.
> 
> There is a *lot* of wisdom in Jim Idles's suggestions.  Each one could be a
> whole lecture.  If you take a class in compiler construction (or go back to
> your notes, if you already had the class) you will see this up close.
> 
> My version is "preserve information; defer decisions as long as possible;
> and make every decision as simple as possible."  If you do these things,
> your language will be easy to maintain and extend.  And if you have users
> for any length of time, these characteristics are probably high on your
> list.  Hopefully you were not given a pathelogical language spec.
> 
> Good luck!
> On Aug 15, 2012 5:43 PM, "Zhaohui Yang" <yezonghui at gmail.com> wrote:
> 
>> sounds promising :)
>> 
>> We have written a program to separate those constants into several inner
>> classes, solves for now.
>> 
>> Yours is definitely better:)
>> 在 2012-8-16 上午1:13,"Francis ANDRE" <francis.andre.kampbell at orange.fr>写道:
>> 
>>> Le 15/08/2012 16:17, Zhaohui Yang a écrit :
>>> 
>>> It's great someone is already trying a fix. I'd be glad to test your fix
>>> when it's out.
>>> 
>>> Would you please introduce a bit what kind of fix is that? Is it for
>>> ANTLRWorks or ANTLR tool, is it a command line option for seperating
>> FOLLOW
>>> set or supressing that, or something else?
>>> 
>>> The 64K syndrone is a pure Java problem due to the constraint that the
>> JVM
>>> does not support static initializer greater than 64K  -- shame on it --.
>>> Thus if you look to the generated lexer and parser, you will see
>> certainly
>>> a lot of DFA classes, each of them having some static initializer values.
>>> The point is that the sum of the static initializer of all those DFAs is
>>> greater than 64K while the static initialization of each DFA is somewhat
>>> small or in most of case les than 64K. Thus, one solution is to extract
>> all
>>> those DFAs classes and put them outside the lexer or the parser in fixed
>>> directories like the following pattern:
>>> 
>>> Let <grammar> the directory of the grammar to generate, then all the
>>> generated DFAs will go in
>>> 
>>> for the lexer's DFAs:    package <grammar>.lexer;
>>> for the parser's DAFs: package <grammar>.parser;
>>> 
>>> and the reference of all those DFAs will be
>>> in the lexer:                 import <grammar>.lexer.*;
>>> in the parser                import <grammar>.parser.*;
>>> 
>>> But hold on, the fix has to be approved by Terr and I did not yet submit
>>> it. It need to pass all unit tests of the ANTLR3.4 and I am working on
>>> it... there is a real challenge getting the parser/lexer compiled for
>> java
>>> code generated without a package...; and all those unit tests are
>> producing
>>> java parser/lexer at the top level directory.
>>> 
>>> 
>>> 2012/8/15 Francis ANDRE <francis.andre.kampbell at orange.fr>
>>> 
>>>> Hi Zhaohui
>>>> 
>>>> I am currently working on fixing this issues with antlr3.4... Once I
>> will
>>>> have a proper patch, would you be interested in testing it??
>>>> 
>>>> FA
>>>> Le 14/08/2012 18:05, Zhaohui Yang a écrit :
>>>> 
>>>> Hi,
>>>>> 
>>>>> Here we have a big grammar and the generated parser.java got a
>>>>> compilation
>>>>> : "the code for the static initializer is exceeding the 65535 bytes
>>>>> limit".
>>>>> 
>>>>> I've searched the net for a while and found that is a widely known
>> limit
>>>>> in
>>>>> JVM or Javac compiler, and not yet has an option to change it higher.
>>>>> 
>>>>> On the ANTLR side, I found 2 solutions proposed by others, but neither
>> of
>>>>> them is totally satisfying:
>>>>> 
>>>>> 1. Seperate the big grammar into 2 *.g files, import one from the
>> other.
>>>>>    Yes, this removes the compilation error with genereated Java. But
>>>>> ANTLRWorks does not support imported grammar well. E.g., I can not
>>>>> interpret a rule in the imported grammar, it's simply not in the rule
>>>>> list
>>>>> for interpreting. And gunit always fail with rules defined in imported
>>>>> grammar.
>>>>> 
>>>>> 2. Modify the generated Java source, seperate the "FOLLOW_xxx_in_yyy"
>>>>> constants into several static classes and change references to them
>>>>> accordingly.
>>>>>    This is proposed here -
>> http://www.antlr.org/pipermail/antlr-interest/2009-November/036608.html.
>>>>> The author of the post actually has a solution into ANTLR source code
>>>>> (some
>>>>> string template). But I can't find the attachment he referred to. And
>>>>> that's in 2009, I suspect the fix could be incompatible with current
>>>>> ANTLR
>>>>> version.
>>>>>    Without this fix we have to do the modificaiton manually or write a
>>>>> script for that. The script is not that easy.
>>>>> 
>>>>> And we found a 3rd solution by ourself, that also involve changing the
>>>>> generated Java:
>>>>> 
>>>>> 3. Remove those FOLLOW_... constant completely, and replace the
>>>>> references
>>>>> with "null".
>>>>>    Surprisingly this works, just no error recovery after this, not a
>>>>> problem for us. But we really worry this is unsafe, since it's not
>>>>> documented anywhere.
>>>>> 
>>>>> After all, we're looking for any other solution that is easier to
>> apply,
>>>>> asumming we'll be constantly changing the grammar and recompile the
>>>>> parser.
>>>>> 
>>>>>  Maybe there is a way to get ANTLRWorks and gunit play well with
>>>>> imported
>>>>> grammar?
>>>>> Maybe there is already a commandline option for antlr Tool, that can
>>>>> genereate FOLLOW_... constants in seperate classes?
>>>>> Maybe there is already a commandline option for antlr Tool, that can
>>>>> supress FOLLOW_... constants code generation?
>>> 
>>> 
>>> --
>>> Regards,
>>> 
>>> Yang, Zhaohui
>> 
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe:
>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address


More information about the antlr-interest mailing list