[antlr-interest] Memory management of C target

Tue Feb 1 09:15:19 PST 2011

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Marco Trudel
> Sent: Tuesday, February 01, 2011 5:39 AM
> To: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Memory management of C target
>
> Dear Jim
>
> On 31.01.2011 18:43, Jim Idle wrote:
> > The C target will be a lot faster than the Java target, but the
> > objects that are created are probably bigger. For v4 I plan to reduce
> that a lot.
> > It is probably better to reduce the input though. 530,000 lines of C
> > code as input seems a bit of a tall order for anything, even if you
> parse it.
> > The individual input files would be better.
>
> The splitting function I wrote is way slower (> 1 min) than the time
> the Java target needs to parse the whole file (20s).

You need a better splitting function! Do it in place not by memcpy and so
on. You can also write a simple override for the character stream.

>
> > Also, I think you were using $text references in your parser and
> these
> > will create hundreds of thousands of string objects that will not be
> > released until you release the parser.
>
> You mean like in:
>
> foo
>     : IDENTIFIER { printf("\%s\n", $IDENTIFIER.text->chars); }
>     ;
>
> This is done about 90'000 times. Nothing 2gb memory couldn't handle.

It is a ton of tiny allocations and it will accumulate. However, I think
that in 3.3 I have fixed a bug that was not releasing memory references
when building a tree until the tree was freed. Try making a version that
does not build a tree and see how it differs.

>
> > To use the text of an object it is
> > better to get the pointer to the input from that object and use the
> > length (start and end pointer are stored in the object) so that you
> > make no copies or memory allocations.
>
> Like:
>
> foo
>     : IDENTIFIER
>       { printf("Input length: \%d, start: \%d, end: \%d\n",
>            strlen($IDENTIFIER->input->data),
>            $IDENTIFIER->start,
>            $IDENTIFIER->stop);
>       }
>     ;
>
> I guess I'm doing something wrong. Though the length is correct, the
> indexes are way out of bounds (example output: Input length: 253,
> start:
> 9338085, end: 9338088).

As per the prior email these are POINTERs to the start and end addresses
in the input text. You cannot use strlen as this will not stop until the
end of the input string, The start pointer points directly to the input
text and so does the end point. The length is the difference in the two.

> But this isn't the main memory usage anyway with only about 90'000
> calls to $IDENTIFIER.text.
>
>
> So I guess my best alternative is using the Java target. But I'm still
> very open to other suggestions where I might waste memory...

Your splitting routine does not seem to be a good algorithm, but why do
you end up with 640,000 lines of C code in the first place? Perhaps you
should start there. However, if you are able to deal with the Java
version, you will certainly find it easier I suspect.

Jim

>
>
> Thanks for your time
> Marco
>
>
> > The $text (in the C target) is a convenience
> > method that is relatively slow and inefficient; it is just there when
> you
> > don't really care that much about those factors. This catches so many
> > people that I may abandon it in v4, in favor of functions/macros that
> give
> > you the information.
> >
> > You can also try 64bit mode, which will raise the 2GB bar.
> >
> > Jim
> >
> >
> >
> >> -----Original Message-----
> >> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> >> bounces at antlr.org] On Behalf Of Marco Trudel
> >> Sent: Monday, January 31, 2011 5:37 AM
> >> To: antlr-interest at antlr.org
> >> Subject: [antlr-interest] Memory management of C target
> >>
> >> Dear all
> >>
> >> Does anyone know how the C target handles memory? I noticed that
> with
> >> very big input (e.g. 530.000 lines of C code) it crashes because it
> >> hits the 2gb process memory limit. Is there something I can tweak to
> >> make it work or do I have to split the input?
> >>
> >> The Java target manages to parse the input if I give the process
> 1gb.
> >> It even requires only 20 seconds.
> >> Would be great if the C target could also do that. Even better it
> the
> >> required time would be about half of the one of the Java target (as
> I'm
> >> used to when the C target can handle the input).
> >>
> >> Thanks
> >> Marco
> >>
> >> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> >> Unsubscribe: http://www.antlr.org/mailman/options/antlr-
> interest/your-
> >> email-address
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe: http://www.antlr.org/mailman/options/antlr-
> interest/your-email-address
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address