[antlr-interest] problem about "the code for the static initializer is exceeding the 65535 bytes limit"

Wed Aug 15 13:49:20 PDT 2012

Maybe your example is one where the lexer does need state, but it should not
cause these huge DFAs unless there is something wonky with the grammar. I am
not having a go at you ;)

I still say that you should start with the grammar. Look at the generated
DFA and see which rule/decision is causing this and left factor:

fragment MASK: ;
INT : ('0'..'9')+ /* Perhaps gated predicate here */ ( '/' '0'..'9'+ { $type
= MASK; } )? ;

But, if I can't see your grammars, I can't get more specific than a few
guesses.

V4 has lexer modes, which may well help you a lot.

Jim

> -----Original Message-----
> From: Francis ANDRE [mailto:francis.andre.kampbell at orange.fr]
> Sent: Wednesday, August 15, 2012 1:08 PM
> To: Jim Idle; parrt at cs.usfca.edu >> Terence Parr
> Cc: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] problem about "the code for the static
> initializer is exceeding the 65535 bytes limit"
>
> Hi Jim
>
> With all respect I have for you, you cannot say that the only problem
> is a poorly designed grammar.
>
> First of all, I would suggest you to look at language such as Cobol or
> Natural or esoteric third level language to take the "problem" in
> scope.  Just as an exemple, Natural allows this kind of syntaxes
>
> 99 / 99                   which means : divide 99 by 99
> 99/99                    which is a mask for date number editing
>
> The real solution for this kind of expressions should be to let to the
> lexer do the job with contextual predicates as the WHITE token is
> generally ignored. If due to the 64K limitation, one should use a
> parser rule instead of lexer rules then the WHITE token becomes fully
> meaningfull and should be expressed in ALL rules of the grammar...which
> is really a painfull change since ANTR2 was working fine with
> contextual semantic predicates in the lexer rules.
>
> Secondly, ANTLR as a generic and general compiler's compiler tool
> should be able to produce lexer and parser even for poorly written
> grammar if such grammar respect the specification of the meta langage.
>
> Third, the 64K problem is really a Java problem linked to the inlining
> of the DFA classes into the lexer and parser. As extracting the DFAs
> outside the generated lexer and parser solve this issue, I do not see
> why one should reject this option since it improves the capability of
> ANTLR without compromising its functionnal offer.
>
> Fourth, the software should adapt at its best to the human and not the
> contrary. That's why compilers have all an optimisation phase so that
> people could write for example i = i + 1; instead of i++ which is the
> cleaner and proper readable way to increment an integer. So it should
> be as much as possible the same for ANTLR for accepting grammar that
> are not overly left factored to overcome a Java limitation.
>
>
> Terr, what's your position on this??
>
> Francis
>
> Le 15/08/2012 20:38, Jim Idle a écrit :
> > It does not need a fix. It is the grammar that needs to be improved.
> > The huge DFAs are indicative of your grammars being overly
> complicated
> > or poorly left factored. ANTLR might do better than it does in some
> > cases, and v4 may well get around a lot of similar issues, but in
> > general, improve your grammar files.
> >
> > First, look at the generated DFA. What rule, or combination of rules
> > is generating this? Start there. Left factor. Simplify. Stop trying
> to
> > do much of anything in the lexer other than match the simplest common
> token set.
> > Stop trying to impose semantics in the parser ("you can only have at
> > most two of 'these' here" - push such things in the tree walk, or add
> > semantic checks (allow any number of 'these', count how many you got,
> > then issue a semantic error).
> >
> > Writing good grammars is not easy. In some ways, because it is easy
> to
> > just type stuff in and give it a whirl, ANTLR can cause you to shoot
> > yourself in the foot!
> >
> > Step back and consider your grammar files. Do you really want a
> > grammar that generates such huge decision tables? What is going
> wrong?
> > It usually is not ANTLR itself.
> >
> >
> > Jim
> >
> >
> >> -----Original Message-----
> >> From:antlr-interest-bounces at antlr.org  [mailto:antlr-interest-
> >> bounces at antlr.org] On Behalf Of Francis ANDRE
> >> Sent: Wednesday, August 15, 2012 10:14 AM
> >> To: Zhaohui Yang
> >> Cc:antlr-interest at antlr.org
> >> Subject: Re: [antlr-interest] problem about "the code for the static
> >> initializer is exceeding the 65535 bytes limit"
> >>
> >> Le 15/08/2012 16:17, Zhaohui Yang a écrit :
> >>> It's great someone is already trying a fix. I'd be glad to test
> your
> >>> fix when it's out.
> >>>
> >>> Would you please introduce a bit what kind of fix is that? Is it
> for
> >>> ANTLRWorks or ANTLR tool, is it a command line option for
> seperating
> >>> FOLLOW set or supressing that, or something else?
> >> The 64K syndrone is a pure Java problem due to the constraint that
> >> the JVM does not support static initializer greater than 64K  --
> >> shame on it --. Thus if you look to the generated lexer and parser,
> >> you will see certainly a lot of DFA classes, each of them having
> some
> >> static initializer values. The point is that the sum of the static
> >> initializer of all those DFAs is greater than 64K while the static
> >> initialization of each DFA is somewhat small or in most of case les
> >> than 64K. Thus, one solution is to extract all those DFAs classes
> and
> >> put them outside the lexer or the parser in fixed directories like
> >> the following
> >> pattern:
> >>
> >> Let <grammar> the directory of the grammar to generate, then all the
> >> generated DFAs will go in
> >>
> >> for the lexer's DFAs:    package <grammar>.lexer;
> >> for the parser's DAFs: package <grammar>.parser;
> >>
> >> and the reference of all those DFAs will be
> >> in the lexer:                 import <grammar>.lexer.*;
> >> in the parser                import <grammar>.parser.*;
> >>
> >> But hold on, the fix has to be approved by Terr and I did not yet
> >> submit it. It need to pass all unit tests of the ANTLR3.4 and I am
> >> working on it... there is a real challenge getting the parser/lexer
> >> compiled for java code generated without a package...; and all those
> >> unit tests are producing java parser/lexer at the top level
> directory.
> >>> 2012/8/15 Francis ANDRE <francis.andre.kampbell at orange.fr
> >>> <mailto:francis.andre.kampbell at orange.fr>>
> >>>
> >>>      Hi Zhaohui
> >>>
> >>>      I am currently working on fixing this issues with antlr3.4...
> >> Once
> >>>      I will have a proper patch, would you be interested in testing
> >> it??
> >>>      FA
> >>>      Le 14/08/2012 18:05, Zhaohui Yang a écrit :
> >>>
> >>>          Hi,
> >>>
> >>>          Here we have a big grammar and the generated parser.java
> >>> got
> >> a
> >>>          compilation
> >>>          : "the code for the static initializer is exceeding the
> 65535
> >>>          bytes limit".
> >>>
> >>>          I've searched the net for a while and found that is a
> widely
> >>>          known limit in
> >>>          JVM or Javac compiler, and not yet has an option to change
> it
> >>>          higher.
> >>>
> >>>          On the ANTLR side, I found 2 solutions proposed by others,
> >> but
> >>>          neither of
> >>>          them is totally satisfying:
> >>>
> >>>          1. Seperate the big grammar into 2 *.g files, import one
> from
> >>>          the other.
> >>>              Yes, this removes the compilation error with
> genereated
> >>>          Java. But
> >>>          ANTLRWorks does not support imported grammar well. E.g., I
> >> can not
> >>>          interpret a rule in the imported grammar, it's simply not
> in
> >>>          the rule list
> >>>          for interpreting. And gunit always fail with rules defined
> in
> >>>          imported
> >>>          grammar.
> >>>
> >>>          2. Modify the generated Java source, seperate the
> >>>          "FOLLOW_xxx_in_yyy"
> >>>          constants into several static classes and change
> references
> >> to
> >>>          them
> >>>          accordingly.
> >>>              This is proposed here -
> >>>          http://www.antlr.org/pipermail/antlr-interest/2009-
> >> November/036608.html
> >>>          .
> >>>          The author of the post actually has a solution into ANTLR
> >>>          source code (some
> >>>          string template). But I can't find the attachment he
> referred
> >>>          to. And
> >>>          that's in 2009, I suspect the fix could be incompatible
> with
> >>>          current ANTLR
> >>>          version.
> >>>              Without this fix we have to do the modificaiton
> manually
> >>>          or write a
> >>>          script for that. The script is not that easy.
> >>>
> >>>          And we found a 3rd solution by ourself, that also involve
> >>>          changing the
> >>>          generated Java:
> >>>
> >>>          3. Remove those FOLLOW_... constant completely, and
> replace
> >>>          the references
> >>>          with "null".
> >>>              Surprisingly this works, just no error recovery after
> >>>          this, not a
> >>>          problem for us. But we really worry this is unsafe, since
> >> it's not
> >>>          documented anywhere.
> >>>
> >>>          After all, we're looking for any other solution that is
> >> easier
> >>>          to apply,
> >>>          asumming we'll be constantly changing the grammar and
> >>>          recompile the parser.
> >>>
> >>>            Maybe there is a way to get ANTLRWorks and gunit play
> well
> >>>          with imported
> >>>          grammar?
> >>>          Maybe there is already a commandline option for antlr
> Tool,
> >>>          that can
> >>>          genereate FOLLOW_... constants in seperate classes?
> >>>          Maybe there is already a commandline option for antlr
> Tool,
> >>>          that can
> >>>          supress FOLLOW_... constants code generation?
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Regards,
> >>>
> >>> Yang, Zhaohui
> >>>
> >> List:http://www.antlr.org/mailman/listinfo/antlr-interest
> >> Unsubscribe:http://www.antlr.org/mailman/options/antlr-
> interest/your-
> >> email-address
> > List:http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe:http://www.antlr.org/mailman/options/antlr-interest/your-
> e
> > mail-address
>