[antlr-interest] Bytes Limit (Marcelo Nichele)

Marcelo Nichele marcelo.nichele at gmail.com
Thu Oct 22 21:39:25 PDT 2009


Hi,

I'm using some lexer rules such like 'aaaa', 'bbb', ... Then, the code
generated become too big. So, I reduced some lexer rules and I moved them to
grammar rules.

Ex:
TAG_START_OPEN 'link' attrUriId='uri' ATTR_EQ attrValueUri=ATTR_VALUE
((attrRoleId='role' ATTR_EQ attrValueRole=ATTR_VALUE) | (attrStartId='start'
ATTR_EQ attrValueStart=ATTR_VALUE)|(attrEndId='end' ATTR_EQ
attrValueEnd=ATTR_VALUE))* TAG_EMPTY_CLOSE

Does anyone have any tips?

Thanks for help,

Marcelo


On Wed, Oct 21, 2009 at 5:00 PM, <antlr-interest-request at antlr.org> wrote:

> Send antlr-interest mailing list submissions to
>        antlr-interest at antlr.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>        http://www.antlr.org/mailman/listinfo/antlr-interest
> or, via email, send a message with subject or body 'help' to
>        antlr-interest-request at antlr.org
>
> You can reach the person managing the list at
>        antlr-interest-owner at antlr.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of antlr-interest digest..."
>
>
> Today's Topics:
>
>   1. Re: Status of the CSharp3 target and my C# ports  of ANTLR and
>      StringTemplate (Robert van der Hulst)
>   2. Re: Bytes Limit (David-Sarah Hopwood)
>   3. Re: Bytes Limit (Jim Idle)
>   4. Re: Status of the CSharp3 target and my C# portsof        ANTLR and
>      StringTemplate (Jim Idle)
>   5. Re: Using multiple grammars with a single parser (Jim Idle)
>   6. [Antlr3 grammar] how to specify alpha token,      numeric token
>      and mix of        both (Hieu Phung)
>   7. Re: [Antlr3 grammar] how to specify alpha token,  numeric
>      token and mix of  both (Kaleb Pederson)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 20 Oct 2009 22:27:43 +0200
> From: Robert van der Hulst <news at heliks.nl>
> Subject: Re: [antlr-interest] Status of the CSharp3 target and my C#
>        ports   of ANTLR and StringTemplate
> To: antlr-interest at antlr.org
> Message-ID: <8410451559.20091020222743 at heliks.nl>
> Content-Type: text/plain; charset="us-ascii"
>
> An HTML attachment was scrubbed...
> URL:
> http://www.antlr.org/pipermail/antlr-interest/attachments/20091020/65d74ab9/attachment-0001.html
>
> ------------------------------
>
> Message: 2
> Date: Wed, 21 Oct 2009 00:24:04 +0100
> From: David-Sarah Hopwood <david-sarah at jacaranda.org>
> Subject: Re: [antlr-interest] Bytes Limit
> To: antlr-interest at antlr.org
> Message-ID: <4ADE4694.5010302 at jacaranda.org>
> Content-Type: text/plain; charset=UTF-8
>
> Marcelo Nichele wrote:
> > Hi,
> >
> > I'm getting started in ANTLR and my grammar generated the
> > specialStateTransition method too big.
> >
> > The error message is *The code of method specialStateTransition(int,
> > IntStream) is exceeding the 65535 bytes limit.*
> >
> > The method assignature is:
> > *public int specialStateTransition(int s, IntStream _input) throws
> > NoViableAltException*
>
> Workaround:
>
> Look at the code for that method in the generated parser source
> (note that there may be multiple DFA inner classes each with a
> specialStateTransition method; the full error message should say which
> one, or just look at the largest such methods).
> Probably the code for specialStateTransition will include code copied
> from predicates in your grammar, duplicated many times. Try to simplify
> the code that is being duplicated.
>
> For example, you could declare a boolean variable in the parser class
> using @parser::members, set it to the predicate condition in an @init
> block of the relevant rule(s), and reference that variable in place of
> the original condition. (Be careful that you aren't changing the behaviour
> of the rule by moving the predicate evaluation to the @init block.)
>
>
> Suggested longer-term improvement:
>
> The size of the generated specialStateTransition methods would be
> considerably reduced if ANTLR were to automatically create temporary
> variables for predicate conditions, rather than duplicating their code.
> Since the DFA object is an instance of an inner class of the parser,
> the workaround above requires the Java compiler to generate references
> to outer class variables, which is more code than would be needed if
> ANTLR were to create such temporaries as local variables of
> specialStateTransition. Since there is no guarantee as to how often
> predicates are evaluated, that change would not affect correctness.
>
> --
> David-Sarah Hopwood  ?  http://davidsarah.livejournal.com
>
>
>
> ------------------------------
>
> Message: 3
> Date: Wed, 21 Oct 2009 13:27:29 +0530
> From: "Jim Idle" <jimi at temporal-wave.com>
> Subject: Re: [antlr-interest] Bytes Limit
> To: "antlr-interest at antlr.org" <antlr-interest at antlr.org>
> Message-ID: <b6a29c89442d004088022678aea26396 at temporal-wave.com>
> Content-Type: text/plain; charset="us-ascii"
>
> This is also quite often caused by a poorly specified grammar (especially
> lexers) causing lots lookahead and states etc. A good way to determine this
> is to find the DFA in question in the generated source code and see what
> decisions/rules it is handling. This should help you pin down where things
> are getting so big and then you can look at the why.
>
>
>
> Jim
>
>
>
> From: antlr-interest-bounces at antlr.org [mailto:
> antlr-interest-bounces at antlr.org] On Behalf Of Horst Dehmer
> Sent: Tuesday, October 20, 2009 12:13 PM
> To: Marcelo Nichele; antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Bytes Limit
>
>
>
> Hello Marcelo,
>
> i'm afraid you hit a hard limit with java binary class files. how is your
> grammar looking, is it unusually big?
> have also a look at
>
> http://groups.google.com/group/comp.lang.java.machine/browse_thread/thread/b0cf268515f1ef55
>
> good luck,
> horst
>
>
> On 20.10.09 06:59, "Marcelo Nichele" <marcelo.nichele at gmail.com> wrote:
>
> The code of method specialStateTransition(int, IntStream) is exceeding the
> 65535 bytes limit
>
>
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://www.antlr.org/pipermail/antlr-interest/attachments/20091021/733c4727/attachment-0001.html
>
> ------------------------------
>
> Message: 4
> Date: Wed, 21 Oct 2009 13:47:59 +0530
> From: "Jim Idle" <jimi at temporal-wave.com>
> Subject: Re: [antlr-interest] Status of the CSharp3 target and my C#
>        portsof ANTLR and StringTemplate
> To: "antlr-interest at antlr.org" <antlr-interest at antlr.org>
> Message-ID: <e46f9f31c2cbe74683984d2670bed31e at temporal-wave.com>
> Content-Type: text/plain; charset="us-ascii"
>
> I think we can expose the public/private stuff easily. I will talk to Ter
> in case there is some reason it is not right now, but I don't think so as
> the ability to do this is part of the current v2 grammar and I just coded it
> in my v3 grammar. We should probably move this to ANTLR dev list.
>
>
>
> Jim
>
>
>
> From: Sam Harwell [mailto:sharwell at pixelminegames.com]
> Sent: Tuesday, October 20, 2009 10:26 PM
> To: Jim Idle; antlr-interest at antlr.org; stringtemplate-interest at antlr.org
> Subject: RE: [antlr-interest] Status of the CSharp3 target and my C#
> portsof ANTLR and StringTemplate
>
>
>
> The code should be nearly the same as that of the CSharp2 target. Here is
> the C# port of the CSharp2 target and CSharp3 target so you can see how the
> CSharp3 one differs. Clearly it should be easy to make it work.
>
>
>
> public class CSharp2Target : Target
>
> {
>
>    public override string EncodeIntAsCharEscape(int v)
>
>    {
>
>        return "\\x" + v.ToString("X");
>
>    }
>
> }
>
>
>
> public class CSharp3Target : Target
>
> {
>
>    public override string EncodeIntAsCharEscape(int v)
>
>    {
>
>        return "\\x" + v.ToString("X");
>
>    }
>
>
>
>    public override string GetTarget64BitStringFromValue(ulong word)
>
>    {
>
>        return "0x" + word.ToString("X");
>
>    }
>
> }
>
>
>
> Something to note: I'm not sure the Java version of the tool exposes the
> property required to support marking rules as public/protected/private.
> We'll have to check that out too, but it should be straightforward.
>
>
>
> Sam
>
>
>
> From: Jim Idle [mailto:jimi at temporal-wave.com]
> Sent: Tuesday, October 20, 2009 12:08 AM
> To: Sam Harwell; antlr-interest at antlr.org;
> stringtemplate-interest at antlr.org
> Subject: RE: [antlr-interest] Status of the CSharp3 target and my C#
> portsof ANTLR and StringTemplate
>
>
>
> OK - well we can add that easily enough J Why don't we try it?
>
>
>
> Jim
>
>
>
> From: Sam Harwell [mailto:sharwell at pixelminegames.com]
> Sent: Tuesday, October 20, 2009 7:46 AM
> To: Jim Idle; antlr-interest at antlr.org; stringtemplate-interest at antlr.org
> Subject: RE: [antlr-interest] Status of the CSharp3 target and my C#
> portsof ANTLR and StringTemplate
>
>
>
> I think the only thing missing is the Java class required for the Java
> version to know the CSharp3 target exists.
>
>
>
> Sam
>
>
>
> From: antlr-interest-bounces at antlr.org [mailto:
> antlr-interest-bounces at antlr.org] On Behalf Of Jim Idle
> Sent: Tuesday, October 20, 2009 2:51 AM
> To: antlr-interest at antlr.org; stringtemplate-interest at antlr.org
> Subject: Re: [antlr-interest] Status of the CSharp3 target and my C#
> portsof ANTLR and StringTemplate
>
>
>
> Top posting for Sam's benefit ;-)
>
>
>
> Not being able to use the Csharp3 target from the standard version of the
> tool is going to be a turn off for many I think L What is it that your port
> of the tool has that the standard version does not. I know you have posted
> some of that, but perhaps we can summarize this and see if such things can
> be absorbed into the standard Java tool? Nobody minds you having your own
> version of anything because it is open source, but most will want ot use the
> 'offical' java version of the tool even if they are targeting C#.
>
>
>
> Thanks for the updates,
>
>
>
> Jim
>
>
>
> From: antlr-interest-bounces at antlr.org [mailto:
> antlr-interest-bounces at antlr.org] On Behalf Of Sam Harwell
> Sent: Monday, October 19, 2009 1:05 AM
> To: antlr-interest at antlr.org; stringtemplate-interest at antlr.org
> Subject: [antlr-interest] Status of the CSharp3 target and my C# ports of
> ANTLR and StringTemplate
>
>
>
> Hi everyone,
>
>
>
> Here's a status update that I know many people are asking for. For each
> portion, I'll talk about the status of the code in Perforce. At the end,
> I'll talk about the status of the posted binaries.
>
>
>
> Basic Status
>
>
>
> StringTemplate and the ANTLR Tool: Up-to-date with the Java version for all
> targets as of August 4, 2009, which covers all of the changes made earlier
> in the year and over the summer.
>
>
>
> CSharp3 Target: working and extensively used in the ANTLR Tool,
> StringTemplate, and the commercial projects I use ANTLR for. I haven't
> tested the -profile and -debug modes because I don't use them, however the
> templates should be "close to working". Currently, the CSharp3 target can
> only be used when generating grammars from the C# port of the tool.
>
>
>
> Design Changes
>
>
>
> 1.       Rather than package the target templates as resources in the
> tool's executable, I've chosen a flat file layout. That way, the templates
> for a target can be updated without recompiling the tool. The targets
> themselves are also implemented as individual DLL's.
>
> 2.       The CSharp3 target declares rules as private methods by default.
> Rules can be made public by simply marking them as such in the grammar:
> "public compile_unit : declaration*;" I have updated the Java target's code
> generation to support this as well, but it's not checked in.
>
> 3.       StringTemplate has code for a high speed dynamically compiled
> interpreter. By default, the build doesn't enable this mode, but when it's
> turned on the output appears to work correctly. I need to do another round
> of tests, but at this point the C# ports of the ANTLR Tool and
> StringTemplate should be significantly faster than the Java version. We've
> hit a brick wall preventing further optimization without rewriting ST, but
> the work on STv4 should give another order of magnitude improvement in
> template rendering performance.
>
>
>
> Things Holding Me Up
>
>
>
> 1.       I haven't finalized the way I'm going to do assembly versioning,
> although I think I've got that worked out now. I'll send a separate mail to
> the list regarding this.
>
> 2.       StringTemplate is only tested in regards to code generation for
> the ANTLR tool. In particular, its ability to locate templates in resources
> or on the file system is not documented and may or may not behave as people
> expect.
>
> 3.       I'm still making periodic changes to the API as I finalize things,
> and breaking changes in production code aren't good. I don't want to suggest
> replacing the CSharp2 target until the CSharp3 target is more tested by
> other people.
>
>
>
> Things I want to do, but not really holding up the builds
>
>
>
> I really want to package a clean integration of ANTLR+CSharp3 for MSBuild.
> We need this. This would include at least MSBuild targets file and templates
> for adding grammars to a project. Unfortunately, there are many issues I
> still need to resolve for this to be a reality, most of which have answers
> in shades of gray.
>
>
>
> Status of the Posted Build
>
>
>
> The build available for download was uploaded on fairly short notice.
> Mistakes (by me) included not having the assembly version set correctly and
> not posting the source code from the build with the binaries. I've been
> trying to wrap some of these things up before posting another build.
>
>
>
>
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://www.antlr.org/pipermail/antlr-interest/attachments/20091021/45738851/attachment-0001.html
>
> ------------------------------
>
> Message: 5
> Date: Wed, 21 Oct 2009 13:44:25 +0530
> From: "Jim Idle" <jimi at temporal-wave.com>
> Subject: Re: [antlr-interest] Using multiple grammars with a single
>        parser
> To: "antlr-interest at antlr.org" <antlr-interest at antlr.org>
> Message-ID: <b51176887aa850468cec0244e1a67da0 at temporal-wave.com>
> Content-Type: text/plain; charset="us-ascii"
>
> This has been covered in previous discussions (use the search), but
> basically you create tokens that look for dates using code that is sensitive
> to the locale, which leaves you with a single lexer with a code based match
> rather than pattern based match.
>
>
>
> DATE : '#' // Assuming that you delimit the dates somehow
>
>              {
>
>                  setText(myDateFunctionThatReturnsString());
>
>              }
>
>             '#'
>
>          ;
>
>
>
> There are other approaches than returning the string but you should get the
> picture?
>
>
>
> Jim
>
>
>
> From: antlr-interest-bounces at antlr.org [mailto:
> antlr-interest-bounces at antlr.org] On Behalf Of Parambir Singh
> Sent: Tuesday, October 20, 2009 7:06 PM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] Using multiple grammars with a single parser
>
>
>
> Hi
>
>
>
> I am working on a project where I want to parse input in different locales
> (e.g. english, french & german dates). I don't want to create multiple
> parsers, since the semantics of the grammar don't change between locales. So
> probably I'll need multiple lexers and a single parser. Moreover, I want to
> specify a locale to the parser and the input should be matched against only
> that particular locale (e.g. german dates should be invalid in english
> locale).
>
>
>
> What would be the best approach to construct such a parser using ANTLR. I
> don't have much experience with ANTLR but I read about grammar inheritance
> and think it could be useful here.
>
>
>
> Thanks
>
> Param
>
>
>
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://www.antlr.org/pipermail/antlr-interest/attachments/20091021/f2d7d8df/attachment-0001.html
>
> ------------------------------
>
> Message: 6
> Date: Wed, 21 Oct 2009 18:23:47 +0800
> From: Hieu Phung <phungngochieu at gmail.com>
> Subject: [antlr-interest] [Antlr3 grammar] how to specify alpha token,
>        numeric token and mix of        both
> To: antlr-interest at antlr.org
> Message-ID:
>        <2d17b250910210323q1703906dnd5b4339e472f6adf at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi all,
>
> My grammar has 3 kinds of tokens:
> 1) number: contain numeric character
> 2) alpha: contain alphabetic character;
> 3) mix: contain number and alpha and hyphen, full stop or space
>
> For example:
> 1/VEC305/03MAR/PTY
> => in the above input data, 03MAR should be interpreted as a number of
> length 2 followed by alpha of length 3. But VEC305 is a mix of length 6.
>
> If I define grammar like below:
>
> NUMBER    : ('0'..'9')+ ;
> ALPHA    : ('a'..'z'|'A'..'Z')+;
> MIX    : (NUMBER | ALPHA | OTHER)+;
> fragment OTHER    : (' ' | '-' | '.')+;
> SLANT    :    '/';
>
> Antlr will return me VEC305 and 03MAR as two MIX tokens. Is there any way
> to
> define tokens such that Antlr will return me number, slant, mix, slant,
> number, alpha, slant, alpha for the input "1/VEC305/03MAR/PTY" ?
>
> Thank you very much for your suggestions.
>
> Regards,
> Helen
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://www.antlr.org/pipermail/antlr-interest/attachments/20091021/c2aac0a2/attachment-0001.html
>
> ------------------------------
>
> Message: 7
> Date: Wed, 21 Oct 2009 08:36:07 -0700
> From: Kaleb Pederson <kaleb.pederson at gmail.com>
> Subject: Re: [antlr-interest] [Antlr3 grammar] how to specify alpha
>        token,  numeric token and mix of        both
> To: antlr-interest at antlr.org
> Message-ID: <200910210836.07105.kaleb.pederson at gmail.com>
> Content-Type: Text/Plain;  charset="us-ascii"
>
> On Wednesday 21 October 2009 03:23:47 am Hieu Phung wrote:
> > My grammar has 3 kinds of tokens:
> > 1) number: contain numeric character
> > 2) alpha: contain alphabetic character;
> > 3) mix: contain number and alpha and hyphen, full stop or space
> >
> > For example:
> > 1/VEC305/03MAR/PTY
> > => in the above input data, 03MAR should be interpreted as a number of
> > length 2 followed by alpha of length 3. But VEC305 is a mix of length 6.
>
> Hieu,
>
> How do you know that VEC305 is a mix of length six?  It sure looks like an
> alpha followed by a number to me... so what makes it special or different
> than 03MAR?
>
> --
> Kaleb Pederson
>
> Twitter - http://twitter.com/kalebpederson
> Blog - http://kalebpederson.com
>
>
> ------------------------------
>
> _______________________________________________
> antlr-interest mailing list
> antlr-interest at antlr.org
> http://www.antlr.org/mailman/listinfo/antlr-interest
>
> End of antlr-interest Digest, Vol 59, Issue 22
> **********************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20091023/7b304317/attachment.html 


More information about the antlr-interest mailing list