[antlr-interest] problem about "the code for the static initializer is exceeding the 65535 bytes limit"
Jim Idle
jimi at temporal-wave.com
Wed Aug 15 13:49:20 PDT 2012
Maybe your example is one where the lexer does need state, but it should not
cause these huge DFAs unless there is something wonky with the grammar. I am
not having a go at you ;)
I still say that you should start with the grammar. Look at the generated
DFA and see which rule/decision is causing this and left factor:
fragment MASK: ;
INT : ('0'..'9')+ /* Perhaps gated predicate here */ ( '/' '0'..'9'+ { $type
= MASK; } )? ;
But, if I can't see your grammars, I can't get more specific than a few
guesses.
V4 has lexer modes, which may well help you a lot.
Jim
> -----Original Message-----
> From: Francis ANDRE [mailto:francis.andre.kampbell at orange.fr]
> Sent: Wednesday, August 15, 2012 1:08 PM
> To: Jim Idle; parrt at cs.usfca.edu >> Terence Parr
> Cc: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] problem about "the code for the static
> initializer is exceeding the 65535 bytes limit"
>
> Hi Jim
>
> With all respect I have for you, you cannot say that the only problem
> is a poorly designed grammar.
>
> First of all, I would suggest you to look at language such as Cobol or
> Natural or esoteric third level language to take the "problem" in
> scope. Just as an exemple, Natural allows this kind of syntaxes
>
> 99 / 99 which means : divide 99 by 99
> 99/99 which is a mask for date number editing
>
> The real solution for this kind of expressions should be to let to the
> lexer do the job with contextual predicates as the WHITE token is
> generally ignored. If due to the 64K limitation, one should use a
> parser rule instead of lexer rules then the WHITE token becomes fully
> meaningfull and should be expressed in ALL rules of the grammar...which
> is really a painfull change since ANTR2 was working fine with
> contextual semantic predicates in the lexer rules.
>
> Secondly, ANTLR as a generic and general compiler's compiler tool
> should be able to produce lexer and parser even for poorly written
> grammar if such grammar respect the specification of the meta langage.
>
> Third, the 64K problem is really a Java problem linked to the inlining
> of the DFA classes into the lexer and parser. As extracting the DFAs
> outside the generated lexer and parser solve this issue, I do not see
> why one should reject this option since it improves the capability of
> ANTLR without compromising its functionnal offer.
>
> Fourth, the software should adapt at its best to the human and not the
> contrary. That's why compilers have all an optimisation phase so that
> people could write for example i = i + 1; instead of i++ which is the
> cleaner and proper readable way to increment an integer. So it should
> be as much as possible the same for ANTLR for accepting grammar that
> are not overly left factored to overcome a Java limitation.
>
>
> Terr, what's your position on this??
>
> Francis
>
> Le 15/08/2012 20:38, Jim Idle a écrit :
> > It does not need a fix. It is the grammar that needs to be improved.
> > The huge DFAs are indicative of your grammars being overly
> complicated
> > or poorly left factored. ANTLR might do better than it does in some
> > cases, and v4 may well get around a lot of similar issues, but in
> > general, improve your grammar files.
> >
> > First, look at the generated DFA. What rule, or combination of rules
> > is generating this? Start there. Left factor. Simplify. Stop trying
> to
> > do much of anything in the lexer other than match the simplest common
> token set.
> > Stop trying to impose semantics in the parser ("you can only have at
> > most two of 'these' here" - push such things in the tree walk, or add
> > semantic checks (allow any number of 'these', count how many you got,
> > then issue a semantic error).
> >
> > Writing good grammars is not easy. In some ways, because it is easy
> to
> > just type stuff in and give it a whirl, ANTLR can cause you to shoot
> > yourself in the foot!
> >
> > Step back and consider your grammar files. Do you really want a
> > grammar that generates such huge decision tables? What is going
> wrong?
> > It usually is not ANTLR itself.
> >
> >
> > Jim
> >
> >
> >> -----Original Message-----
> >> From:antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> >> bounces at antlr.org] On Behalf Of Francis ANDRE
> >> Sent: Wednesday, August 15, 2012 10:14 AM
> >> To: Zhaohui Yang
> >> Cc:antlr-interest at antlr.org
> >> Subject: Re: [antlr-interest] problem about "the code for the static
> >> initializer is exceeding the 65535 bytes limit"
> >>
> >> Le 15/08/2012 16:17, Zhaohui Yang a écrit :
> >>> It's great someone is already trying a fix. I'd be glad to test
> your
> >>> fix when it's out.
> >>>
> >>> Would you please introduce a bit what kind of fix is that? Is it
> for
> >>> ANTLRWorks or ANTLR tool, is it a command line option for
> seperating
> >>> FOLLOW set or supressing that, or something else?
> >> The 64K syndrone is a pure Java problem due to the constraint that
> >> the JVM does not support static initializer greater than 64K --
> >> shame on it --. Thus if you look to the generated lexer and parser,
> >> you will see certainly a lot of DFA classes, each of them having
> some
> >> static initializer values. The point is that the sum of the static
> >> initializer of all those DFAs is greater than 64K while the static
> >> initialization of each DFA is somewhat small or in most of case les
> >> than 64K. Thus, one solution is to extract all those DFAs classes
> and
> >> put them outside the lexer or the parser in fixed directories like
> >> the following
> >> pattern:
> >>
> >> Let <grammar> the directory of the grammar to generate, then all the
> >> generated DFAs will go in
> >>
> >> for the lexer's DFAs: package <grammar>.lexer;
> >> for the parser's DAFs: package <grammar>.parser;
> >>
> >> and the reference of all those DFAs will be
> >> in the lexer: import <grammar>.lexer.*;
> >> in the parser import <grammar>.parser.*;
> >>
> >> But hold on, the fix has to be approved by Terr and I did not yet
> >> submit it. It need to pass all unit tests of the ANTLR3.4 and I am
> >> working on it... there is a real challenge getting the parser/lexer
> >> compiled for java code generated without a package...; and all those
> >> unit tests are producing java parser/lexer at the top level
> directory.
> >>> 2012/8/15 Francis ANDRE <francis.andre.kampbell at orange.fr
> >>> <mailto:francis.andre.kampbell at orange.fr>>
> >>>
> >>> Hi Zhaohui
> >>>
> >>> I am currently working on fixing this issues with antlr3.4...
> >> Once
> >>> I will have a proper patch, would you be interested in testing
> >> it??
> >>> FA
> >>> Le 14/08/2012 18:05, Zhaohui Yang a écrit :
> >>>
> >>> Hi,
> >>>
> >>> Here we have a big grammar and the generated parser.java
> >>> got
> >> a
> >>> compilation
> >>> : "the code for the static initializer is exceeding the
> 65535
> >>> bytes limit".
> >>>
> >>> I've searched the net for a while and found that is a
> widely
> >>> known limit in
> >>> JVM or Javac compiler, and not yet has an option to change
> it
> >>> higher.
> >>>
> >>> On the ANTLR side, I found 2 solutions proposed by others,
> >> but
> >>> neither of
> >>> them is totally satisfying:
> >>>
> >>> 1. Seperate the big grammar into 2 *.g files, import one
> from
> >>> the other.
> >>> Yes, this removes the compilation error with
> genereated
> >>> Java. But
> >>> ANTLRWorks does not support imported grammar well. E.g., I
> >> can not
> >>> interpret a rule in the imported grammar, it's simply not
> in
> >>> the rule list
> >>> for interpreting. And gunit always fail with rules defined
> in
> >>> imported
> >>> grammar.
> >>>
> >>> 2. Modify the generated Java source, seperate the
> >>> "FOLLOW_xxx_in_yyy"
> >>> constants into several static classes and change
> references
> >> to
> >>> them
> >>> accordingly.
> >>> This is proposed here -
> >>> http://www.antlr.org/pipermail/antlr-interest/2009-
> >> November/036608.html
> >>> .
> >>> The author of the post actually has a solution into ANTLR
> >>> source code (some
> >>> string template). But I can't find the attachment he
> referred
> >>> to. And
> >>> that's in 2009, I suspect the fix could be incompatible
> with
> >>> current ANTLR
> >>> version.
> >>> Without this fix we have to do the modificaiton
> manually
> >>> or write a
> >>> script for that. The script is not that easy.
> >>>
> >>> And we found a 3rd solution by ourself, that also involve
> >>> changing the
> >>> generated Java:
> >>>
> >>> 3. Remove those FOLLOW_... constant completely, and
> replace
> >>> the references
> >>> with "null".
> >>> Surprisingly this works, just no error recovery after
> >>> this, not a
> >>> problem for us. But we really worry this is unsafe, since
> >> it's not
> >>> documented anywhere.
> >>>
> >>> After all, we're looking for any other solution that is
> >> easier
> >>> to apply,
> >>> asumming we'll be constantly changing the grammar and
> >>> recompile the parser.
> >>>
> >>> Maybe there is a way to get ANTLRWorks and gunit play
> well
> >>> with imported
> >>> grammar?
> >>> Maybe there is already a commandline option for antlr
> Tool,
> >>> that can
> >>> genereate FOLLOW_... constants in seperate classes?
> >>> Maybe there is already a commandline option for antlr
> Tool,
> >>> that can
> >>> supress FOLLOW_... constants code generation?
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Regards,
> >>>
> >>> Yang, Zhaohui
> >>>
> >> List:http://www.antlr.org/mailman/listinfo/antlr-interest
> >> Unsubscribe:http://www.antlr.org/mailman/options/antlr-
> interest/your-
> >> email-address
> > List:http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe:http://www.antlr.org/mailman/options/antlr-interest/your-
> e
> > mail-address
>
More information about the antlr-interest
mailing list