[antlr-interest] Antlr v4 - C++ target

Gokulakannan Somasundaram gokul007 at gmail.com
Wed Jan 18 03:55:56 PST 2012


Is there any chance of a quicker C++ Target availability?
If the target activity, can be made into sub-tasks, i am ready to take up
some sub-tasks.

Thanks,
Gokul.

On Fri, Jan 13, 2012 at 2:59 AM, A Z <asicaddress at gmail.com> wrote:

> Hi Jim,
>
>  I don't think there's anything wrong with the C target. My impression of
> the code was that it was modified from the Java target and uses function
> pointers to be easily modifiable. I don't have any performance issues other
> than memory consumption and I think this is due to my atypical use case.
> Overall, the tool works great and I appreciate all the work that went into
> it.
>
>  Sam's timeline of 1 year is too long to wait for the new C++ target so
> I've already begun modifying the 3.4 lexer for my own purposes. I honestly
> wasn't expecting any more changes to ANTLR3.
>
>
>
> On Fri, Jan 13, 2012 at 12:07 AM, Jim Idle <jimi at temporal-wave.com> wrote:
>
> > I do plan on doing that in fact. However I would like to respond to the
> > criticisms here as follows:
> >
> > 1) I wrote the C runtime in under two weeks because I needed it for a
> > project and at that time ANTLR v3 was not released (beta). Hence by
> > waiting until v4 runtime is stable then we should get some cleaner
> > runtimes.
> > 2) So, I did not really know how anyone else would want to use it and so
> I
> > made absolutely everything dynamic. Since that time there have been lots
> > of memory and performance tweaks, but I am sure there are more I can do.
> > 3) I basically copied the Java model as is with the idea being that it
> > would be easier to follow changes that were made to the Java runtime in
> > the C runtime.
> > 4) There are performance enhancements you can turn on such as adding
> > defines for ANTLR3_INLINE_INPUT_8BIT or ANTLR3_INLINE_INPUT_16BIT and
> > defining SKIP_FOLLOW_SETS to avoid stacking rule descriptors only used by
> > error reporting.
> > 5) All my tests and most everyone else finds the C v3 runtime to be
> faster
> > than the C++ runtime, so I can only conclude that there is something
> > different about one or two grammar files.
> > 6) I did implement reuse other than for trees and that helps most of the
> > use cases where the initial memory allocation takes time and so you don't
> > want to tear it down and re-allocate it.
> > 7) It is a lot easier to start with someone else's code than it is to
> > start with vi and a blank screen. Where's the love?
> > 8) ANTLR is naturally more heavyweight than some other tools, but it is
> > usually easier to use it.
> > 9) Why not wait for v4 where some of these things are addressed as a
> > natural consequence of the design.
> >
> >
> > A minimum token needs the type and a pointer to the text, plus either a
> > pointer to the end of the text or the length. If you use a length then
> > with encodings like UTF8, you will start to need to traverse the text to
> > extract nnn characters. There are always tradeoffs. Pointers are 64 bits
> > not 32 bits on a 64 bit compiler. You can compile in 32 bit mode if you
> > don't need 64 bit stuff.
> >
> > Jim
> >
> >
> > > -----Original Message-----
> > > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> > > bounces at antlr.org] On Behalf Of A Z
> > > Sent: Wednesday, January 11, 2012 5:38 PM
> > > To: Ruslan Zasukhin
> > > Cc: antlr-interest at antlr.org
> > > Subject: Re: [antlr-interest] Antlr v4 - C++ target
> > >
> > > The realistic minimum I see for commontoken in the existing 3.4 code is
> > > 32 bytes on a 64-bit architecture. This would involve modifications to
> > > the code generator to no longer use the function pointers(for
> > > setStart/setStopIndex/setType) and using a smaller data type for the
> > > channel, factory and type members. There is still an additional
> > > 16B/token used by the vector data structure holding the tokens.
> > >
> > >
> > >
> > > On Wed, Jan 11, 2012 at 5:09 PM, Ruslan Zasukhin <
> > > ruslan_zasukhin at valentina-db.com> wrote:
> > >
> > > > On 1/11/12 11:12 AM, "Loring Craymer" <lgcraymer at yahoo.com> wrote:
> > > >
> > > > > If Jim did not implement the vtable indirection (that could be
> > > > > easily
> > > > changed,
> > > > > if so), then there is a little more opportunity for optimization,
> > > > > but
> > > > still
> > > > > the problem is that state information takes up much more memory
> > > than
> > > > does the
> > > > > text in tokens.
> > > >
> > > > Right,
> > > >
> > > > Well, lets look on antrl3commontoken.h
> > > >
> > > > API:
> > > >        19   pointers to func
> > > >                        32 bit os    19 * 4  = 76 bytes
> > > >
> > > > And about
> > > >        11 * 4 bytes  of useful info
> > > >
> > > >
> > > > So there is chance that in c++ style
> > > > OR with single pointer on ala-VT  token will become
> > > >
> > > >    from 118 bytes to 48 bytes
> > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > >
> > > > Ruslan Zasukhin
> > > > VP Engineering and New Technology
> > > > Paradigma Software, Inc
> > > >
> > > > Valentina - Joining Worlds of Information
> > > http://www.paradigmasoft.com
> > > >
> > > > [I feel the need: the need for speed]
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > > > Unsubscribe:
> > > > http://www.antlr.org/mailman/options/antlr-interest/your-email-
> > > address
> > > >
> > >
> > > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> > > email-address
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe:
> > http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> >
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>


More information about the antlr-interest mailing list