[antlr-interest] getText() of C runtime.
Jim Idle
jimi at temporal-wave.com
Tue Sep 7 12:40:33 PDT 2010
Sorry, that is antlr.markmail.org
Jim
> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Jim Idle
> Sent: Tuesday, September 07, 2010 12:38 PM
> To: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] getText() of C runtime.
>
> Please consult markmail.antlr.org, where I answer this question numerous
> times ;-), the documentation of the API, or the code. I am contemplating
just
> getting rid of it and having C programmers just use the token to build the
> string in whatever way they want.
>
> The STRING stuff is meant as an aide and is not useful if you want to
parse
> lots of things. Also, it is not a leak as it auto tracks the memory and
releases it
> when you free the tree walker. It is basically the support for $text. It
gets a
> new copy at each reference because I cannot know what you did with the
> last copy. So, you must store the pointer if you want to reuse it.
>
> However, if you want something more efficient, then you must use the
> token struct directly, which will give you pointers directly to the text
in the
> input. The demo C parser in the downloadable examples shows some
> manipulation of this, but it is just a pointer to the start of the text
and a
> pointer to the end of the text. Assuming that you know the encoding of
your
> input, then you have everything you need. If you are not manipulating the
> text, then you can use it directly without copying it, as in the
downloadable
> examples.
>
> Jim
>
> > -----Original Message-----
> > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> > bounces at antlr.org] On Behalf Of Kenneth Domino
> > Sent: Tuesday, September 07, 2010 12:23 PM
> > To: antlr-interest at antlr.org
> > Subject: [antlr-interest] getText() of C runtime.
> >
> > Hi All,
> >
> > I'm using the C runtime of an Antlr-generated parser. I noticed a
> > huge memory leak in my code, but it turns out it's because I call
> > function
> getText()
> > (def'ed in antlr3commontoken.c of the Antlr C runtime) quite a bit, on
> tree
> > nodes during my hand-coded tree walking interpreter.
> > Apparently, getText() creates
> > a new copy of the string every time. Eg:
> >
> > pANTLR3_BASE_TREE node = ...;
> > char * text = node->getText(node);
> > char * text2 = node->getText(node); // text2 is another malloc'ed
> > buffer containing the same string for node.
> >
> > However, if you read the source code, it obviously intends to do some
> > memoizing, because it takes into consideration "token->textState",
> > where the previous value computed is returned for ANTLR3_TEXT_STRING.
> > I can, of course, and probably will, create a string table wrapper for
> getText().
> > But I'm wondering if anyone knows if there is some way of hooking into
> this
> > part of the API so that that I don't have to.
> >
> > Ken
> >
> > The source for the runtime function is:
> >
> > static pANTLR3_STRING getText (pANTLR3_COMMON_TOKEN token)
> > {
> > switch (token->textState)
> > {
> > case ANTLR3_TEXT_STRING:
> >
> > // Someone already created a string for this token, so we
just
> > // use it.
> > //
> > return token->tokText.text;
> > break;
> >
> > case ANTLR3_TEXT_CHARP:
> >
> > // We had a straight text pointer installed, now we
> > // must convert it to a string. Note we have to do this here
> > // or otherwise setText8() will just install the same char*
> > //
> > if (token->strFactory != NULL)
> > {
> > token->tokText.text =
> > token->strFactory->newStr8(token->strFactory,
> > (pANTLR3_UINT8)token->tokText.chars);
> > token->textState = ANTLR3_TEXT_STRING;
> > return token->tokText.text;
> > }
> > else
> > {
> > // We cannot do anything here
> > //
> > return NULL;
> > }
> > break;
> >
> > default:
> >
> > // EOF is a special case
> > //
> > if (token->type == ANTLR3_TOKEN_EOF)
> > {
> > token->tokText.text =
> > token->strFactory->newStr8(token->strFactory,
> (pANTLR3_UINT8)"<EOF>");
> > token->textState = ANTLR3_TEXT_STRING;
> > return token->tokText.text;
> > }
> >
> >
> > // We had nothing installed in the token, create a new
string
> > // from the input stream
> > //
> >
> > if (token->input != NULL)
> > {
> >
> > ////////////////////// The following code does a malloc/string copy
> > every time I call getText. //////////
> > return token->input->substr( token->input,
> >
> token->getStartIndex(token),
> >
token->getStopIndex(token)
> > );
> > }
> >
> > // Nothing to return, there is no input stream
> > //
> > return NULL;
> > break;
> > }
> > }
> >
> >
> >
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> > email-address
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address
More information about the antlr-interest
mailing list