[antlr-interest] getText() of C runtime.

Kenneth Domino kenneth.domino at domemtech.com
Tue Sep 7 12:22:58 PDT 2010


Hi All,

I'm using the C runtime of an Antlr-generated parser.  I noticed a huge 
memory leak in my code,
but it turns out it's because I call function getText() (def'ed in 
antlr3commontoken.c of the Antlr C runtime)
quite a bit, on tree nodes during my hand-coded tree walking interpreter. 
Apparently, getText() creates
a new copy of the string every time. Eg:

pANTLR3_BASE_TREE node = ...;
char * text = node->getText(node);
char * text2 = node->getText(node); // text2 is another malloc'ed buffer 
containing the same string for node.

However, if you read the source code, it obviously intends to do some 
memoizing, because it takes into
consideration "token->textState", where the previous value computed is 
returned for
ANTLR3_TEXT_STRING.  I can, of course, and probably will, create a string 
table wrapper for getText().
But I'm wondering if anyone knows if there is some way of hooking into this 
part of the API so that
that I don't have to.

Ken

The source for the runtime function is:

static  pANTLR3_STRING  getText         (pANTLR3_COMMON_TOKEN token)
{
    switch (token->textState)
    {
        case ANTLR3_TEXT_STRING:

            // Someone already created a string for this token, so we just
            // use it.
            //
            return  token->tokText.text;
            break;

        case ANTLR3_TEXT_CHARP:

            // We had a straight text pointer installed, now we
            // must convert it to a string. Note we have to do this here
            // or otherwise setText8() will just install the same char*
            //
            if  (token->strFactory != NULL)
            {
                token->tokText.text = 
token->strFactory->newStr8(token->strFactory, 
(pANTLR3_UINT8)token->tokText.chars);
                token->textState    = ANTLR3_TEXT_STRING;
                return token->tokText.text;
            }
            else
            {
                // We cannot do anything here
                //
                return NULL;
            }
            break;

        default:

            // EOF is a special case
            //
            if (token->type == ANTLR3_TOKEN_EOF)
            {
                token->tokText.text = 
token->strFactory->newStr8(token->strFactory, (pANTLR3_UINT8)"<EOF>");
                token->textState    = ANTLR3_TEXT_STRING;
                return token->tokText.text;
            }


            // We had nothing installed in the token, create a new string
            // from the input stream
            //

            if  (token->input != NULL)
            {

////////////////////// The following code does a malloc/string copy every 
time I call getText. //////////
                return  token->input->substr(   token->input,
                                                token->getStartIndex(token),
                                                token->getStopIndex(token)
                                            );
            }

            // Nothing to return, there is no input stream
            //
            return NULL;
            break;
    }
}





More information about the antlr-interest mailing list