[antlr-interest] Memory management of C target

Marco Trudel marco at mtsystems.ch
Tue Feb 1 05:39:23 PST 2011


Dear Jim

On 31.01.2011 18:43, Jim Idle wrote:
> The C target will be a lot faster than the Java target, but the objects
> that are created are probably bigger. For v4 I plan to reduce that a lot.
> It is probably better to reduce the input though. 530,000 lines of C code
> as input seems a bit of a tall order for anything, even if you parse it.
> The individual input files would be better.

The splitting function I wrote is way slower (> 1 min) than the time the 
Java target needs to parse the whole file (20s).

> Also, I think you were using $text references in your parser and these
> will create hundreds of thousands of string objects that will not be
> released until you release the parser.

You mean like in:

foo
    : IDENTIFIER { printf("\%s\n", $IDENTIFIER.text->chars); }
    ;

This is done about 90'000 times. Nothing 2gb memory couldn't handle.

> To use the text of an object it is
> better to get the pointer to the input from that object and use the length
> (start and end pointer are stored in the object) so that you make no
> copies or memory allocations.

Like:

foo
    : IDENTIFIER
      { printf("Input length: \%d, start: \%d, end: \%d\n",
           strlen($IDENTIFIER->input->data),
           $IDENTIFIER->start,
           $IDENTIFIER->stop);
      }
    ;

I guess I'm doing something wrong. Though the length is correct, the 
indexes are way out of bounds (example output: Input length: 253, start: 
9338085, end: 9338088).
But this isn't the main memory usage anyway with only about 90'000 calls 
to $IDENTIFIER.text.


So I guess my best alternative is using the Java target. But I'm still 
very open to other suggestions where I might waste memory...


Thanks for your time
Marco


> The $text (in the C target) is a convenience
> method that is relatively slow and inefficient; it is just there when you
> don't really care that much about those factors. This catches so many
> people that I may abandon it in v4, in favor of functions/macros that give
> you the information.
>
> You can also try 64bit mode, which will raise the 2GB bar.
>
> Jim
>
>
>
>> -----Original Message-----
>> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
>> bounces at antlr.org] On Behalf Of Marco Trudel
>> Sent: Monday, January 31, 2011 5:37 AM
>> To: antlr-interest at antlr.org
>> Subject: [antlr-interest] Memory management of C target
>>
>> Dear all
>>
>> Does anyone know how the C target handles memory? I noticed that with
>> very big input (e.g. 530.000 lines of C code) it crashes because it
>> hits the 2gb process memory limit. Is there something I can tweak to
>> make it work or do I have to split the input?
>>
>> The Java target manages to parse the input if I give the process 1gb.
>> It even requires only 20 seconds.
>> Would be great if the C target could also do that. Even better it the
>> required time would be about half of the one of the Java target (as I'm
>> used to when the C target can handle the input).
>>
>> Thanks
>> Marco
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
>> email-address
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address


More information about the antlr-interest mailing list