[antlr-interest] getText() of C runtime.
Kenneth Domino
kenneth.domino at domemtech.com
Tue Sep 7 12:22:58 PDT 2010
Hi All,
I'm using the C runtime of an Antlr-generated parser. I noticed a huge
memory leak in my code,
but it turns out it's because I call function getText() (def'ed in
antlr3commontoken.c of the Antlr C runtime)
quite a bit, on tree nodes during my hand-coded tree walking interpreter.
Apparently, getText() creates
a new copy of the string every time. Eg:
pANTLR3_BASE_TREE node = ...;
char * text = node->getText(node);
char * text2 = node->getText(node); // text2 is another malloc'ed buffer
containing the same string for node.
However, if you read the source code, it obviously intends to do some
memoizing, because it takes into
consideration "token->textState", where the previous value computed is
returned for
ANTLR3_TEXT_STRING. I can, of course, and probably will, create a string
table wrapper for getText().
But I'm wondering if anyone knows if there is some way of hooking into this
part of the API so that
that I don't have to.
Ken
The source for the runtime function is:
static pANTLR3_STRING getText (pANTLR3_COMMON_TOKEN token)
{
switch (token->textState)
{
case ANTLR3_TEXT_STRING:
// Someone already created a string for this token, so we just
// use it.
//
return token->tokText.text;
break;
case ANTLR3_TEXT_CHARP:
// We had a straight text pointer installed, now we
// must convert it to a string. Note we have to do this here
// or otherwise setText8() will just install the same char*
//
if (token->strFactory != NULL)
{
token->tokText.text =
token->strFactory->newStr8(token->strFactory,
(pANTLR3_UINT8)token->tokText.chars);
token->textState = ANTLR3_TEXT_STRING;
return token->tokText.text;
}
else
{
// We cannot do anything here
//
return NULL;
}
break;
default:
// EOF is a special case
//
if (token->type == ANTLR3_TOKEN_EOF)
{
token->tokText.text =
token->strFactory->newStr8(token->strFactory, (pANTLR3_UINT8)"<EOF>");
token->textState = ANTLR3_TEXT_STRING;
return token->tokText.text;
}
// We had nothing installed in the token, create a new string
// from the input stream
//
if (token->input != NULL)
{
////////////////////// The following code does a malloc/string copy every
time I call getText. //////////
return token->input->substr( token->input,
token->getStartIndex(token),
token->getStopIndex(token)
);
}
// Nothing to return, there is no input stream
//
return NULL;
break;
}
}
More information about the antlr-interest
mailing list