[antlr-interest] ANTLR Peak Memory Issue

Gokulakannan Somasundaram gokul007 at gmail.com
Fri Jan 20 03:08:07 PST 2012


>From the discussions above, it looks like this is a known problem. Jim
Idle, who is the author of C Target, has made all the functions as function
pointers and placed it inside the struct. This seems to have been done for
code flexibility. Now this affects the memory usage of parsers, especially
in the 64 bit cases.

a) Removing $text is a must, which you have already completed.
b) Check whether the suggestion mentioned here will work out for you -
http://www.antlr.org/pipermail/antlr-interest/2010-March/037840.html
c) It has been already communicated in the list, that 4.0 C++ parser will
not be available atleast for the next 1 year.

Gokul.

On Thu, Jan 19, 2012 at 2:27 PM, Manu Chopra <mchopra at cadence.com> wrote:

> We are writing a parser using ANTLR, however find unusually high peak
> memory. Some key points:
> *       We use ANTLR C target. Our application is C++.
> *       There are no references to $text etc. We directly use token start
> and end pointers to form string ourselves, where required.
> *       Grammar is LL(2).
> *       We observe that the token structure itself is bulky 264 bytes or
> something on 64 bit platform.
> *       Further, looks like that ANTLR is tokenizing entire source
> upfront. This with large token size leads to almost peak memory even before
> real parsing begins.
> *       I went thru some of the earlier posts on the forum and see idea of
> partitioning the source file lexically and process in parts. Problem is
> that even sections of the source, which can be lexically identified, are
> large enough to give us memory problem.
>
> Can you suggest options we can explore to reduce peak the memory? Is there
> a token stream implementation which keeps only some constant number of
> tokens in memory as oppose to all?
>
> I also understand that active development work is going on with ANTLR 4.x:
> *       Can you share some information on its likely availability of C/C++
> target? Approximate time frame is good.
> *       If token size going to be smaller in 4.x?
> *       Will it still demand that all text be tokenized upfront?
>
> Thank you,
> -Manu.
>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>


More information about the antlr-interest mailing list