[antlr-interest] problem about "the code for the static initializer is exceeding the 65535 bytes limit"

Wed Aug 15 18:04:13 PDT 2012

Hi Zhaohui,

You already know that you've discovered a theme which evokes some passion
in the ANTLR community.

There is a *lot* of wisdom in Jim Idles's suggestions.  Each one could be a
whole lecture.  If you take a class in compiler construction (or go back to
your notes, if you already had the class) you will see this up close.

My version is "preserve information; defer decisions as long as possible;
and make every decision as simple as possible."  If you do these things,
your language will be easy to maintain and extend.  And if you have users
for any length of time, these characteristics are probably high on your
list.  Hopefully you were not given a pathelogical language spec.

Good luck!
 On Aug 15, 2012 5:43 PM, "Zhaohui Yang" <yezonghui at gmail.com> wrote:

> sounds promising :)
>
> We have written a program to separate those constants into several inner
> classes, solves for now.
>
> Yours is definitely better:)
> 在 2012-8-16 上午1:13，"Francis ANDRE" <francis.andre.kampbell at orange.fr>写道：
>
> >  Le 15/08/2012 16:17, Zhaohui Yang a écrit :
> >
> > It's great someone is already trying a fix. I'd be glad to test your fix
> > when it's out.
> >
> > Would you please introduce a bit what kind of fix is that? Is it for
> > ANTLRWorks or ANTLR tool, is it a command line option for seperating
> FOLLOW
> > set or supressing that, or something else?
> >
> > The 64K syndrone is a pure Java problem due to the constraint that the
> JVM
> > does not support static initializer greater than 64K  -- shame on it --.
> > Thus if you look to the generated lexer and parser, you will see
> certainly
> > a lot of DFA classes, each of them having some static initializer values.
> > The point is that the sum of the static initializer of all those DFAs is
> > greater than 64K while the static initialization of each DFA is somewhat
> > small or in most of case les than 64K. Thus, one solution is to extract
> all
> > those DFAs classes and put them outside the lexer or the parser in fixed
> > directories like the following pattern:
> >
> > Let <grammar> the directory of the grammar to generate, then all the
> > generated DFAs will go in
> >
> > for the lexer's DFAs:    package <grammar>.lexer;
> > for the parser's DAFs: package <grammar>.parser;
> >
> > and the reference of all those DFAs will be
> > in the lexer:                 import <grammar>.lexer.*;
> > in the parser                import <grammar>.parser.*;
> >
> > But hold on, the fix has to be approved by Terr and I did not yet submit
> > it. It need to pass all unit tests of the ANTLR3.4 and I am working on
> > it... there is a real challenge getting the parser/lexer compiled for
> java
> > code generated without a package...; and all those unit tests are
> producing
> > java parser/lexer at the top level directory.
> >
> >
> > 2012/8/15 Francis ANDRE <francis.andre.kampbell at orange.fr>
> >
> >> Hi Zhaohui
> >>
> >> I am currently working on fixing this issues with antlr3.4... Once I
> will
> >> have a proper patch, would you be interested in testing it??
> >>
> >> FA
> >> Le 14/08/2012 18:05, Zhaohui Yang a écrit :
> >>
> >> Hi,
> >>>
> >>> Here we have a big grammar and the generated parser.java got a
> >>> compilation
> >>> : "the code for the static initializer is exceeding the 65535 bytes
> >>> limit".
> >>>
> >>> I've searched the net for a while and found that is a widely known
> limit
> >>> in
> >>> JVM or Javac compiler, and not yet has an option to change it higher.
> >>>
> >>> On the ANTLR side, I found 2 solutions proposed by others, but neither
> of
> >>> them is totally satisfying:
> >>>
> >>> 1. Seperate the big grammar into 2 *.g files, import one from the
> other.
> >>>     Yes, this removes the compilation error with genereated Java. But
> >>> ANTLRWorks does not support imported grammar well. E.g., I can not
> >>> interpret a rule in the imported grammar, it's simply not in the rule
> >>> list
> >>> for interpreting. And gunit always fail with rules defined in imported
> >>> grammar.
> >>>
> >>> 2. Modify the generated Java source, seperate the "FOLLOW_xxx_in_yyy"
> >>> constants into several static classes and change references to them
> >>> accordingly.
> >>>     This is proposed here -
> >>>
> http://www.antlr.org/pipermail/antlr-interest/2009-November/036608.html.
> >>> The author of the post actually has a solution into ANTLR source code
> >>> (some
> >>> string template). But I can't find the attachment he referred to. And
> >>> that's in 2009, I suspect the fix could be incompatible with current
> >>> ANTLR
> >>> version.
> >>>     Without this fix we have to do the modificaiton manually or write a
> >>> script for that. The script is not that easy.
> >>>
> >>> And we found a 3rd solution by ourself, that also involve changing the
> >>> generated Java:
> >>>
> >>> 3. Remove those FOLLOW_... constant completely, and replace the
> >>> references
> >>> with "null".
> >>>     Surprisingly this works, just no error recovery after this, not a
> >>> problem for us. But we really worry this is unsafe, since it's not
> >>> documented anywhere.
> >>>
> >>> After all, we're looking for any other solution that is easier to
> apply,
> >>> asumming we'll be constantly changing the grammar and recompile the
> >>> parser.
> >>>
> >>>   Maybe there is a way to get ANTLRWorks and gunit play well with
> >>> imported
> >>> grammar?
> >>> Maybe there is already a commandline option for antlr Tool, that can
> >>> genereate FOLLOW_... constants in seperate classes?
> >>> Maybe there is already a commandline option for antlr Tool, that can
> >>> supress FOLLOW_... constants code generation?
> >>>
> >>>
> >>
> >
> >
> > --
> > Regards,
> >
> > Yang, Zhaohui
> >
> >
> >
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>