[antlr-interest] Antlr v4 - C++ target

A Z asicaddress at gmail.com
Fri Jan 20 18:22:26 PST 2012


With this (very quickly written) code I see about 7MB/sec for the lexer
using clang++ and using static linking. Memory use is about 30:1 but many
features have been removed, like getText and setText.


On Wed, Jan 18, 2012 at 5:55 PM, Gokulakannan Somasundaram <
gokul007 at gmail.com> wrote:

> Is there any chance of a quicker C++ Target availability?
> If the target activity, can be made into sub-tasks, i am ready to take up
> some sub-tasks.
>
> Thanks,
> Gokul.
>
> On Fri, Jan 13, 2012 at 2:59 AM, A Z <asicaddress at gmail.com> wrote:
>
> > Hi Jim,
> >
> >  I don't think there's anything wrong with the C target. My impression of
> > the code was that it was modified from the Java target and uses function
> > pointers to be easily modifiable. I don't have any performance issues
> other
> > than memory consumption and I think this is due to my atypical use case.
> > Overall, the tool works great and I appreciate all the work that went
> into
> > it.
> >
> >  Sam's timeline of 1 year is too long to wait for the new C++ target so
> > I've already begun modifying the 3.4 lexer for my own purposes. I
> honestly
> > wasn't expecting any more changes to ANTLR3.
> >
> >
> >
> > On Fri, Jan 13, 2012 at 12:07 AM, Jim Idle <jimi at temporal-wave.com>
> wrote:
> >
> > > I do plan on doing that in fact. However I would like to respond to the
> > > criticisms here as follows:
> > >
> > > 1) I wrote the C runtime in under two weeks because I needed it for a
> > > project and at that time ANTLR v3 was not released (beta). Hence by
> > > waiting until v4 runtime is stable then we should get some cleaner
> > > runtimes.
> > > 2) So, I did not really know how anyone else would want to use it and
> so
> > I
> > > made absolutely everything dynamic. Since that time there have been
> lots
> > > of memory and performance tweaks, but I am sure there are more I can
> do.
> > > 3) I basically copied the Java model as is with the idea being that it
> > > would be easier to follow changes that were made to the Java runtime in
> > > the C runtime.
> > > 4) There are performance enhancements you can turn on such as adding
> > > defines for ANTLR3_INLINE_INPUT_8BIT or ANTLR3_INLINE_INPUT_16BIT and
> > > defining SKIP_FOLLOW_SETS to avoid stacking rule descriptors only used
> by
> > > error reporting.
> > > 5) All my tests and most everyone else finds the C v3 runtime to be
> > faster
> > > than the C++ runtime, so I can only conclude that there is something
> > > different about one or two grammar files.
> > > 6) I did implement reuse other than for trees and that helps most of
> the
> > > use cases where the initial memory allocation takes time and so you
> don't
> > > want to tear it down and re-allocate it.
> > > 7) It is a lot easier to start with someone else's code than it is to
> > > start with vi and a blank screen. Where's the love?
> > > 8) ANTLR is naturally more heavyweight than some other tools, but it is
> > > usually easier to use it.
> > > 9) Why not wait for v4 where some of these things are addressed as a
> > > natural consequence of the design.
> > >
> > >
> > > A minimum token needs the type and a pointer to the text, plus either a
> > > pointer to the end of the text or the length. If you use a length then
> > > with encodings like UTF8, you will start to need to traverse the text
> to
> > > extract nnn characters. There are always tradeoffs. Pointers are 64
> bits
> > > not 32 bits on a 64 bit compiler. You can compile in 32 bit mode if you
> > > don't need 64 bit stuff.
> > >
> > > Jim
> > >
> > >
> > > > -----Original Message-----
> > > > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> > > > bounces at antlr.org] On Behalf Of A Z
> > > > Sent: Wednesday, January 11, 2012 5:38 PM
> > > > To: Ruslan Zasukhin
> > > > Cc: antlr-interest at antlr.org
> > > > Subject: Re: [antlr-interest] Antlr v4 - C++ target
> > > >
> > > > The realistic minimum I see for commontoken in the existing 3.4 code
> is
> > > > 32 bytes on a 64-bit architecture. This would involve modifications
> to
> > > > the code generator to no longer use the function pointers(for
> > > > setStart/setStopIndex/setType) and using a smaller data type for the
> > > > channel, factory and type members. There is still an additional
> > > > 16B/token used by the vector data structure holding the tokens.
> > > >
> > > >
> > > >
> > > > On Wed, Jan 11, 2012 at 5:09 PM, Ruslan Zasukhin <
> > > > ruslan_zasukhin at valentina-db.com> wrote:
> > > >
> > > > > On 1/11/12 11:12 AM, "Loring Craymer" <lgcraymer at yahoo.com> wrote:
> > > > >
> > > > > > If Jim did not implement the vtable indirection (that could be
> > > > > > easily
> > > > > changed,
> > > > > > if so), then there is a little more opportunity for optimization,
> > > > > > but
> > > > > still
> > > > > > the problem is that state information takes up much more memory
> > > > than
> > > > > does the
> > > > > > text in tokens.
> > > > >
> > > > > Right,
> > > > >
> > > > > Well, lets look on antrl3commontoken.h
> > > > >
> > > > > API:
> > > > >        19   pointers to func
> > > > >                        32 bit os    19 * 4  = 76 bytes
> > > > >
> > > > > And about
> > > > >        11 * 4 bytes  of useful info
> > > > >
> > > > >
> > > > > So there is chance that in c++ style
> > > > > OR with single pointer on ala-VT  token will become
> > > > >
> > > > >    from 118 bytes to 48 bytes
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > >
> > > > > Ruslan Zasukhin
> > > > > VP Engineering and New Technology
> > > > > Paradigma Software, Inc
> > > > >
> > > > > Valentina - Joining Worlds of Information
> > > > http://www.paradigmasoft.com
> > > > >
> > > > > [I feel the need: the need for speed]
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > > > > Unsubscribe:
> > > > > http://www.antlr.org/mailman/options/antlr-interest/your-email-
> > > > address
> > > > >
> > > >
> > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > > > Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-
> > > > email-address
> > >
> > > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > > Unsubscribe:
> > > http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> > >
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe:
> > http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> >
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: antlrcpp.tar.gz
Type: application/x-gzip
Size: 168128 bytes
Desc: not available
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20120121/a8638792/attachment-0001.gz 


More information about the antlr-interest mailing list