[antlr-interest] getText() of C runtime.

Jim Idle jimi at temporal-wave.com
Tue Sep 7 12:38:12 PDT 2010


Please consult markmail.antlr.org, where I answer this question numerous
times ;-), the documentation of the API, or the code. I am contemplating
just getting rid of it and having C programmers just use the token to build
the string in whatever way they want.

The STRING stuff is meant as an aide and is not useful if you want to parse
lots of things. Also, it is not a leak as it auto tracks the memory and
releases it when you free the tree walker. It is basically the support for
$text. It gets a new copy at each reference because I cannot know what you
did with the last copy. So, you must store the pointer if you want to reuse
it.

However, if you want something more efficient, then you must use the token
struct directly, which will give you pointers directly to the text in the
input. The demo C parser in the downloadable examples shows some
manipulation of this, but it is just a pointer to the start of the text and
a pointer to the end of the text. Assuming that you know the encoding of
your input, then you have everything you need. If you are not manipulating
the text, then you can use it directly without copying it, as in the
downloadable examples.

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Kenneth Domino
> Sent: Tuesday, September 07, 2010 12:23 PM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] getText() of C runtime.
> 
> Hi All,
> 
> I'm using the C runtime of an Antlr-generated parser.  I noticed a huge
> memory leak in my code, but it turns out it's because I call function
getText()
> (def'ed in antlr3commontoken.c of the Antlr C runtime) quite a bit, on
tree
> nodes during my hand-coded tree walking interpreter.
> Apparently, getText() creates
> a new copy of the string every time. Eg:
> 
> pANTLR3_BASE_TREE node = ...;
> char * text = node->getText(node);
> char * text2 = node->getText(node); // text2 is another malloc'ed buffer
> containing the same string for node.
> 
> However, if you read the source code, it obviously intends to do some
> memoizing, because it takes into consideration "token->textState", where
> the previous value computed is returned for ANTLR3_TEXT_STRING.  I can, of
> course, and probably will, create a string table wrapper for getText().
> But I'm wondering if anyone knows if there is some way of hooking into
this
> part of the API so that that I don't have to.
> 
> Ken
> 
> The source for the runtime function is:
> 
> static  pANTLR3_STRING  getText         (pANTLR3_COMMON_TOKEN token)
> {
>     switch (token->textState)
>     {
>         case ANTLR3_TEXT_STRING:
> 
>             // Someone already created a string for this token, so we just
>             // use it.
>             //
>             return  token->tokText.text;
>             break;
> 
>         case ANTLR3_TEXT_CHARP:
> 
>             // We had a straight text pointer installed, now we
>             // must convert it to a string. Note we have to do this here
>             // or otherwise setText8() will just install the same char*
>             //
>             if  (token->strFactory != NULL)
>             {
>                 token->tokText.text =
> token->strFactory->newStr8(token->strFactory,
> (pANTLR3_UINT8)token->tokText.chars);
>                 token->textState    = ANTLR3_TEXT_STRING;
>                 return token->tokText.text;
>             }
>             else
>             {
>                 // We cannot do anything here
>                 //
>                 return NULL;
>             }
>             break;
> 
>         default:
> 
>             // EOF is a special case
>             //
>             if (token->type == ANTLR3_TOKEN_EOF)
>             {
>                 token->tokText.text =
> token->strFactory->newStr8(token->strFactory, (pANTLR3_UINT8)"<EOF>");
>                 token->textState    = ANTLR3_TEXT_STRING;
>                 return token->tokText.text;
>             }
> 
> 
>             // We had nothing installed in the token, create a new string
>             // from the input stream
>             //
> 
>             if  (token->input != NULL)
>             {
> 
> ////////////////////// The following code does a malloc/string copy every
> time I call getText. //////////
>                 return  token->input->substr(   token->input,
>
token->getStartIndex(token),
>                                                 token->getStopIndex(token)
>                                             );
>             }
> 
>             // Nothing to return, there is no input stream
>             //
>             return NULL;
>             break;
>     }
> }
> 
> 
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address



More information about the antlr-interest mailing list