[antlr-interest] Big grammar => static initializer/method size is exceeding the 65535 bytes limit

Jim Idle jimi at temporal-wave.com
Wed Nov 4 11:53:49 PST 2009


I think that you are of course somewhat correct in that there seem to be certain ways you can express things that cause such explosions, and perhaps these can be looked at case by case for the reasons why and fixed/improved.

But my experience tells me that the things that cause such explosions are generally not the greatest ways to express things anyway. Sure, if they are mathematically equivalent then they should resolve down to the same code. But there are so many ways to construct things that I think at times practicality gets in the way. Obviously equivalence has to be proven in some finite time, but more than that, just keeping the analysis code manageable seems desirable to some extent. However, as soon as users throw in actions then things like equivalence got out the window anyway.

So, I don't generally end up with such huge explosions and if I do I usually notice that my grammar looks a bit wonky anyway and there is a better/easier way to express things.

Of course, apart from the things that might or might not be improved, there are things that cannot be done because of how users organize things. For instance:

grammar t3;

r : SEQ* EOF ;

SEQ : (INT '.')? INT ;

fragment
INT :	'0'..'9'+
    ;

WS  :   ( ' '
        | '\t'
        | '\r'
        | '\n'
        ) {$channel=HIDDEN;}
    ;

In the SEQ rule, having the optional prefix causes quite a lot of DFA for what it is. However it is clearly equivalent to:

SEQ : INT ('.' INT)? ;

... for all practical purposes, which generates much simpler code. In a parser it is still trivial to write the AST according to what such elements 'mean' and even if you are placing actions directly in the parser, you can still construct it this second way (parser code is much better here - this is just a trivial example to labor a point). However, as soon as a user says:

SEQ : (INT '.' {action because this is present} )? INT ;

Then no amount of analysis can get around the fact that the action has to be executed. I only point out here that writing grammars without much thought to what is going on under the hood is probably not such a good idea in the first place.

In general I have seen that the user's organization of the grammar is the thing that far outweighs any cleverness that is lacking in ANTLR. But, I did say in general and "2 out of 3..." ;-). 

At the end of the day, nothing is better than a well formed left factored grammar. People seem to think that backtracking or lots of predicates adds some readability to a grammar; personally I think it is the opposite and that what happens is that people train themselves to read/see it that way. Train yourself to see it differently and you will find backtracking grammars and lots of predicates totally get in the way of your thinking. 

Of course, horses for courses as they say, but as a professional programmer, one should usually strive to produce neat, well crafted stuff and not throw stuff together and hope the tool set works it out for you. I hasten to add here that I am not throwing that accusation at the people involved here - I have no idea how they put their grammars together.

I personally don't have any objection to putting DFAs in their own space, or otherwise moving around the statics, but I do seem to think that we went down this road and there seemed to be some particularly good reason not to do it. But, either that reason is no longer valid, or I just plain forgot what it was!! The Java target isn't my bag man, so I will leave that up to Ter (I will make a not to ask him to look at it). C does not have this problem per se, but of course if you end up with huge tables of static DFA, then there is a lot of data space in the resulting object code.

Jim

> -----Original Message-----
> From: Loring Craymer [mailto:lgcraymer at yahoo.com]
> Sent: Wednesday, November 04, 2009 11:13 AM
> To: Jim Idle; Antlr interest
> Subject: Re: [antlr-interest] Big grammar => static initializer/method
> size is exceeding the 65535 bytes limit
> 
> If DFA size grew linearly with the size of the grammar, Jim would be
> correct.  The evidence I have seen is that they grow nonlinearly, so
> partitioning a grammar will not always work, and it is always best if a
> tool circumvents mysterious "explosions" during development.  Alex's
> solution is both elegant and easy to  apply, and the net cost is a few
> more .class files.
> 
> --Loring
> 
> 
> 
> 
> ----- Original Message ----
> > From: Jim Idle <jimi at temporal-wave.com>
> > To: Antlr interest <antlr-interest at antlr.org>
> > Sent: Wed, November 4, 2009 8:10:14 AM
> > Subject: Re: [antlr-interest] Big grammar => static
> initializer/method size is exceeding the 65535 bytes limit
> >
> > Guys - you are asking for the wrong problem to be fixed (at least of
> the three
> > of you, at least two will be ;-). Try the new -X options, then look
> at splitting
> > your grammar into multiple import grammars, then start taking out
> huge
> > predicates such as (expression)=> or generally (rule)=>. You can stop
> anywhere
> > along that path if you do not feel that optimizing the grammar is
> something
> > worth your while and the first and/or second options make the DFA
> table size
> > issue go away.
> >
> > There are cases where big DFAs become inevitable, and then you should
> definitely
> > look at the import capability, which will prevent a single parser
> class being
> > used for everything and allow you to manage what goes in which class
> at the
> > grammar level.
> >
> > Jim
> >
> > > -----Original Message-----
> > > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> > > bounces at antlr.org] On Behalf Of Renee Luo
> > > Sent: Wednesday, November 04, 2009 6:45 AM
> > > To: Antlr interest
> > > Subject: Re: [antlr-interest] Big grammar => static
> initializer/method
> > > size is exceeding the 65535 bytes limit
> > >
> > > Yes, we are trying to migrate our ANTLR2 grammar to ANTLR2, we are
> also
> > > facing this problem. If the static initialize code will be
> separated
> > > from parser_class, That's will be great for us.
> > >
> > > Renee
> > >
> > > -----Original Message-----
> > > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> > > bounces at antlr.org] On Behalf Of Andreas Meyer
> > > Sent: Wednesday, November 04, 2009 8:32 AM
> > > To: Antlr interest
> > > Subject: Re: [antlr-interest] Big grammar => static
> initializer/method
> > > size is exceeding the 65535 bytes limit
> > >
> > > Back in the days when we tried to migrate our ANTLR2 grammar to
> ANTLR3,
> > > we also experienced this problem, due to lots of static initializer
> > > code
> > > in the _parser_ class. Our solution was to apply some perl-skript
> > > magic,
> > > but if Alex Marin now proposes a built-in solution, that is only
> good
> > > for ANTLR.
> > >
> > > Andreas
> > >
> > > Jim Idle schrieb:
> > > > I think that the issue is more likely something to do with your
> lexer
> > > specification. You should not need to worry about having lots of
> > > keywords, so one of the other rules must be causing the huge
> expansion.
> > > For instance I have problems with the complete lexer for TSQL,
> which
> > > has more keywords than you can shake a stick at.
> > > >
> > > > Did you ever post your complete lexer spec? I was out of the
> country
> > > when you first started this thread.
> > > >
> > > > Jim
> > > >
> > >
> > >
> > > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > > Unsubscribe: http://www.antlr.org/mailman/options/antlr-
> interest/your-
> > > email-address
> > >
> > > This email and its attachments may be confidential and are intended
> > > solely for the use of the individual to whom it is addressed. Any
> views
> > > or opinions expressed are solely those of the author and do not
> > > necessarily represent those of ImexSystems Inc.
> > > If you are not the intended recipient of this email and its
> > > attachments, you must take no action based upon them, nor must you
> copy
> > > or show them to anyone.
> > > Please contact the sender if you believe you have received this
> email
> > > in error.
> > >
> > > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > > Unsubscribe: http://www.antlr.org/mailman/options/antlr-
> interest/your-
> > > email-address
> >
> >
> >
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe:
> > http://www.antlr.org/mailman/options/antlr-interest/your-email-
> address
> 
> 
> 
> 





More information about the antlr-interest mailing list