[antlr-interest] problem about "the code for the static initializer is exceeding the 65535 bytes limit"

Wed Aug 15 11:38:54 PDT 2012

It does not need a fix. It is the grammar that needs to be improved. The
huge DFAs are indicative of your grammars being overly complicated or poorly
left factored. ANTLR might do better than it does in some cases, and v4 may
well get around a lot of similar issues, but in general, improve your
grammar files.

First, look at the generated DFA. What rule, or combination of rules is
generating this? Start there. Left factor. Simplify. Stop trying to do much
of anything in the lexer other than match the simplest common token set.
Stop trying to impose semantics in the parser ("you can only have at most
two of 'these' here" - push such things in the tree walk, or add semantic
checks (allow any number of 'these', count how many you got, then issue a
semantic error).

Writing good grammars is not easy. In some ways, because it is easy to just
type stuff in and give it a whirl, ANTLR can cause you to shoot yourself in
the foot!

Step back and consider your grammar files. Do you really want a grammar that
generates such huge decision tables? What is going wrong? It usually is not
ANTLR itself.

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Francis ANDRE
> Sent: Wednesday, August 15, 2012 10:14 AM
> To: Zhaohui Yang
> Cc: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] problem about "the code for the static
> initializer is exceeding the 65535 bytes limit"
>
> Le 15/08/2012 16:17, Zhaohui Yang a écrit :
> > It's great someone is already trying a fix. I'd be glad to test your
> > fix when it's out.
> >
> > Would you please introduce a bit what kind of fix is that? Is it for
> > ANTLRWorks or ANTLR tool, is it a command line option for seperating
> > FOLLOW set or supressing that, or something else?
> The 64K syndrone is a pure Java problem due to the constraint that the
> JVM does not support static initializer greater than 64K  -- shame on
> it --. Thus if you look to the generated lexer and parser, you will see
> certainly a lot of DFA classes, each of them having some static
> initializer values. The point is that the sum of the static initializer
> of all those DFAs is greater than 64K while the static initialization
> of each DFA is somewhat small or in most of case les than 64K. Thus,
> one solution is to extract all those DFAs classes and put them outside
> the lexer or the parser in fixed directories like the following
> pattern:
>
> Let <grammar> the directory of the grammar to generate, then all the
> generated DFAs will go in
>
> for the lexer's DFAs:    package <grammar>.lexer;
> for the parser's DAFs: package <grammar>.parser;
>
> and the reference of all those DFAs will be
> in the lexer:                 import <grammar>.lexer.*;
> in the parser                import <grammar>.parser.*;
>
> But hold on, the fix has to be approved by Terr and I did not yet
> submit it. It need to pass all unit tests of the ANTLR3.4 and I am
> working on it... there is a real challenge getting the parser/lexer
> compiled for java code generated without a package...; and all those
> unit tests are producing java parser/lexer at the top level directory.
> >
> > 2012/8/15 Francis ANDRE <francis.andre.kampbell at orange.fr
> > <mailto:francis.andre.kampbell at orange.fr>>
> >
> >     Hi Zhaohui
> >
> >     I am currently working on fixing this issues with antlr3.4...
> Once
> >     I will have a proper patch, would you be interested in testing
> it??
> >
> >     FA
> >     Le 14/08/2012 18:05, Zhaohui Yang a écrit :
> >
> >         Hi,
> >
> >         Here we have a big grammar and the generated parser.java got
> a
> >         compilation
> >         : "the code for the static initializer is exceeding the 65535
> >         bytes limit".
> >
> >         I've searched the net for a while and found that is a widely
> >         known limit in
> >         JVM or Javac compiler, and not yet has an option to change it
> >         higher.
> >
> >         On the ANTLR side, I found 2 solutions proposed by others,
> but
> >         neither of
> >         them is totally satisfying:
> >
> >         1. Seperate the big grammar into 2 *.g files, import one from
> >         the other.
> >             Yes, this removes the compilation error with genereated
> >         Java. But
> >         ANTLRWorks does not support imported grammar well. E.g., I
> can not
> >         interpret a rule in the imported grammar, it's simply not in
> >         the rule list
> >         for interpreting. And gunit always fail with rules defined in
> >         imported
> >         grammar.
> >
> >         2. Modify the generated Java source, seperate the
> >         "FOLLOW_xxx_in_yyy"
> >         constants into several static classes and change references
> to
> >         them
> >         accordingly.
> >             This is proposed here -
> >         http://www.antlr.org/pipermail/antlr-interest/2009-
> November/036608.html
> >         .
> >         The author of the post actually has a solution into ANTLR
> >         source code (some
> >         string template). But I can't find the attachment he referred
> >         to. And
> >         that's in 2009, I suspect the fix could be incompatible with
> >         current ANTLR
> >         version.
> >             Without this fix we have to do the modificaiton manually
> >         or write a
> >         script for that. The script is not that easy.
> >
> >         And we found a 3rd solution by ourself, that also involve
> >         changing the
> >         generated Java:
> >
> >         3. Remove those FOLLOW_... constant completely, and replace
> >         the references
> >         with "null".
> >             Surprisingly this works, just no error recovery after
> >         this, not a
> >         problem for us. But we really worry this is unsafe, since
> it's not
> >         documented anywhere.
> >
> >         After all, we're looking for any other solution that is
> easier
> >         to apply,
> >         asumming we'll be constantly changing the grammar and
> >         recompile the parser.
> >
> >           Maybe there is a way to get ANTLRWorks and gunit play well
> >         with imported
> >         grammar?
> >         Maybe there is already a commandline option for antlr Tool,
> >         that can
> >         genereate FOLLOW_... constants in seperate classes?
> >         Maybe there is already a commandline option for antlr Tool,
> >         that can
> >         supress FOLLOW_... constants code generation?
> >
> >
> >
> >
> >
> > --
> > Regards,
> >
> > Yang, Zhaohui
> >
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address