[antlr-interest] C runtime Memory Usage
Jim Idle
jimi at temporal-wave.com
Sat Jan 24 09:47:31 PST 2009
Bot Tiger wrote:
> Hello, I am using the C runtime and experiencing heavy memory usage.
>
> I am interpreting an AST tree recursively.
>
> In each interpret call, I am checking the token text by using
> pANTLR3_BASE_TREE->getText()->chars.
> Am I suppose to manually free this when I am done with it?
It is a convenience function that creates (and tracks) a memory
structure that copies the string from the input every time you call
getText(). It is this way because (and I think this is in the docs)
sometimes you want a new copy of the input string and sometimes you want
to modify the string you get back but not modify the input.
So, you need to do one of the following:
Call getText() once and then cache the result so you don't keep creating
copies of the input token;
Create your own function that you can just pass the token/node pointer
to and uses the input stream directly (this is all the getText() call
does, but by the way, you usually use the $X.text reference, not
getText() directly.
These convenience functions are not intended for use in things like
interpreters as every time you reference $X.text you will create a new
instance of the string, and though the runtime tracks these references
and releases them when you free the recognizer, you will quickly use
huge amounts of memory. So, you need your own function that makes more
optimal use of memory. All you need do is look at the code for
getText(). You will see that it finds the start position in the input
stream from the start position of the first token spanned by the node
and the end position in the input stream by looking at the end position
of the last token spanned by the node. If you know that there is only
one token in the node, then you can just use that directly. The main
point however, is that you won't allocate memory or copy any strings
this way, you will have a pointer into the input stream directly (which
is already allocated of course) and you will know how many characters
that pointer represents.. So long as you are not going to manipulate the
string, then you can use it directly in place.
However, for an interpreter, you probably want to build a table of
string literals at parse time and create nodes that just reference them
in a string table. There are user definable pointers and integers
available in nodes and tokens, specifically to make it easy to do this.
So you could use the runtime string tables like this in the parser:
parser grammar fred;
...
literal:
s=STRING_LITERAL { $s.custom = $s.text; } // Store the
pANTLR3_STRING for use in the interpreter
;
Or, you could pin the general text reference into the tokText union in
the token, if you are willing to dig into the code.
However, for an interpreter, you are probably best building a string
table of your own choosing, that does not copy theinput text unless
there is some underlying reason that you must.
That is where all your memory is going.
>
> Also, I am recursively calling the children nodes with
> pANTLR3_BASE_TREE->getChild().
> Am I also supposed to free these as well? I was assuming that they
> returned only 1 copy of the node.
It just gives you a pointer to the child, it does not duplicate it. It
is your string stuff that you need to rethink :-) I shoudl porbaly make
this stuff more explicit in the documentation. However I think that
there are comments in the examples that tell you all about this. See the
polydif example.
Jim
More information about the antlr-interest
mailing list