[antlr-interest] Antlr v4 - C++ target

Thu Jan 12 10:59:31 PST 2012

Hi Jim,

  I don't think there's anything wrong with the C target. My impression of
the code was that it was modified from the Java target and uses function
pointers to be easily modifiable. I don't have any performance issues other
than memory consumption and I think this is due to my atypical use case.
Overall, the tool works great and I appreciate all the work that went into
it.

  Sam's timeline of 1 year is too long to wait for the new C++ target so
I've already begun modifying the 3.4 lexer for my own purposes. I honestly
wasn't expecting any more changes to ANTLR3.

On Fri, Jan 13, 2012 at 12:07 AM, Jim Idle <jimi at temporal-wave.com> wrote:

> I do plan on doing that in fact. However I would like to respond to the
> criticisms here as follows:
>
> 1) I wrote the C runtime in under two weeks because I needed it for a
> project and at that time ANTLR v3 was not released (beta). Hence by
> waiting until v4 runtime is stable then we should get some cleaner
> runtimes.
> 2) So, I did not really know how anyone else would want to use it and so I
> made absolutely everything dynamic. Since that time there have been lots
> of memory and performance tweaks, but I am sure there are more I can do.
> 3) I basically copied the Java model as is with the idea being that it
> would be easier to follow changes that were made to the Java runtime in
> the C runtime.
> 4) There are performance enhancements you can turn on such as adding
> defines for ANTLR3_INLINE_INPUT_8BIT or ANTLR3_INLINE_INPUT_16BIT and
> defining SKIP_FOLLOW_SETS to avoid stacking rule descriptors only used by
> error reporting.
> 5) All my tests and most everyone else finds the C v3 runtime to be faster
> than the C++ runtime, so I can only conclude that there is something
> different about one or two grammar files.
> 6) I did implement reuse other than for trees and that helps most of the
> use cases where the initial memory allocation takes time and so you don't
> want to tear it down and re-allocate it.
> 7) It is a lot easier to start with someone else's code than it is to
> start with vi and a blank screen. Where's the love?
> 8) ANTLR is naturally more heavyweight than some other tools, but it is
> usually easier to use it.
> 9) Why not wait for v4 where some of these things are addressed as a
> natural consequence of the design.
>
>
> A minimum token needs the type and a pointer to the text, plus either a
> pointer to the end of the text or the length. If you use a length then
> with encodings like UTF8, you will start to need to traverse the text to
> extract nnn characters. There are always tradeoffs. Pointers are 64 bits
> not 32 bits on a 64 bit compiler. You can compile in 32 bit mode if you
> don't need 64 bit stuff.
>
> Jim
>
>
> > -----Original Message-----
> > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> > bounces at antlr.org] On Behalf Of A Z
> > Sent: Wednesday, January 11, 2012 5:38 PM
> > To: Ruslan Zasukhin
> > Cc: antlr-interest at antlr.org
> > Subject: Re: [antlr-interest] Antlr v4 - C++ target
> >
> > The realistic minimum I see for commontoken in the existing 3.4 code is
> > 32 bytes on a 64-bit architecture. This would involve modifications to
> > the code generator to no longer use the function pointers(for
> > setStart/setStopIndex/setType) and using a smaller data type for the
> > channel, factory and type members. There is still an additional
> > 16B/token used by the vector data structure holding the tokens.
> >
> >
> >
> > On Wed, Jan 11, 2012 at 5:09 PM, Ruslan Zasukhin <
> > ruslan_zasukhin at valentina-db.com> wrote:
> >
> > > On 1/11/12 11:12 AM, "Loring Craymer" <lgcraymer at yahoo.com> wrote:
> > >
> > > > If Jim did not implement the vtable indirection (that could be
> > > > easily
> > > changed,
> > > > if so), then there is a little more opportunity for optimization,
> > > > but
> > > still
> > > > the problem is that state information takes up much more memory
> > than
> > > does the
> > > > text in tokens.
> > >
> > > Right,
> > >
> > > Well, lets look on antrl3commontoken.h
> > >
> > > API:
> > >        19   pointers to func
> > >                        32 bit os    19 * 4  = 76 bytes
> > >
> > > And about
> > >        11 * 4 bytes  of useful info
> > >
> > >
> > > So there is chance that in c++ style
> > > OR with single pointer on ala-VT  token will become
> > >
> > >    from 118 bytes to 48 bytes
> > >
> > >
> > >
> > > --
> > > Best regards,
> > >
> > > Ruslan Zasukhin
> > > VP Engineering and New Technology
> > > Paradigma Software, Inc
> > >
> > > Valentina - Joining Worlds of Information
> > http://www.paradigmasoft.com
> > >
> > > [I feel the need: the need for speed]
> > >
> > >
> > >
> > >
> > >
> > > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > > Unsubscribe:
> > > http://www.antlr.org/mailman/options/antlr-interest/your-email-
> > address
> > >
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> > email-address
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>