[antlr-interest] getText() of C runtime.

Jim Idle jimi at temporal-wave.com
Tue Sep 7 12:40:33 PDT 2010


Sorry, that is antlr.markmail.org 

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Jim Idle
> Sent: Tuesday, September 07, 2010 12:38 PM
> To: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] getText() of C runtime.
> 
> Please consult markmail.antlr.org, where I answer this question numerous
> times ;-), the documentation of the API, or the code. I am contemplating
just
> getting rid of it and having C programmers just use the token to build the
> string in whatever way they want.
> 
> The STRING stuff is meant as an aide and is not useful if you want to
parse
> lots of things. Also, it is not a leak as it auto tracks the memory and
releases it
> when you free the tree walker. It is basically the support for $text. It
gets a
> new copy at each reference because I cannot know what you did with the
> last copy. So, you must store the pointer if you want to reuse it.
> 
> However, if you want something more efficient, then you must use the
> token struct directly, which will give you pointers directly to the text
in the
> input. The demo C parser in the downloadable examples shows some
> manipulation of this, but it is just a pointer to the start of the text
and a
> pointer to the end of the text. Assuming that you know the encoding of
your
> input, then you have everything you need. If you are not manipulating the
> text, then you can use it directly without copying it, as in the
downloadable
> examples.
> 
> Jim
> 
> > -----Original Message-----
> > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> > bounces at antlr.org] On Behalf Of Kenneth Domino
> > Sent: Tuesday, September 07, 2010 12:23 PM
> > To: antlr-interest at antlr.org
> > Subject: [antlr-interest] getText() of C runtime.
> >
> > Hi All,
> >
> > I'm using the C runtime of an Antlr-generated parser.  I noticed a
> > huge memory leak in my code, but it turns out it's because I call
> > function
> getText()
> > (def'ed in antlr3commontoken.c of the Antlr C runtime) quite a bit, on
> tree
> > nodes during my hand-coded tree walking interpreter.
> > Apparently, getText() creates
> > a new copy of the string every time. Eg:
> >
> > pANTLR3_BASE_TREE node = ...;
> > char * text = node->getText(node);
> > char * text2 = node->getText(node); // text2 is another malloc'ed
> > buffer containing the same string for node.
> >
> > However, if you read the source code, it obviously intends to do some
> > memoizing, because it takes into consideration "token->textState",
> > where the previous value computed is returned for ANTLR3_TEXT_STRING.
> > I can, of course, and probably will, create a string table wrapper for
> getText().
> > But I'm wondering if anyone knows if there is some way of hooking into
> this
> > part of the API so that that I don't have to.
> >
> > Ken
> >
> > The source for the runtime function is:
> >
> > static  pANTLR3_STRING  getText         (pANTLR3_COMMON_TOKEN token)
> > {
> >     switch (token->textState)
> >     {
> >         case ANTLR3_TEXT_STRING:
> >
> >             // Someone already created a string for this token, so we
just
> >             // use it.
> >             //
> >             return  token->tokText.text;
> >             break;
> >
> >         case ANTLR3_TEXT_CHARP:
> >
> >             // We had a straight text pointer installed, now we
> >             // must convert it to a string. Note we have to do this here
> >             // or otherwise setText8() will just install the same char*
> >             //
> >             if  (token->strFactory != NULL)
> >             {
> >                 token->tokText.text =
> > token->strFactory->newStr8(token->strFactory,
> > (pANTLR3_UINT8)token->tokText.chars);
> >                 token->textState    = ANTLR3_TEXT_STRING;
> >                 return token->tokText.text;
> >             }
> >             else
> >             {
> >                 // We cannot do anything here
> >                 //
> >                 return NULL;
> >             }
> >             break;
> >
> >         default:
> >
> >             // EOF is a special case
> >             //
> >             if (token->type == ANTLR3_TOKEN_EOF)
> >             {
> >                 token->tokText.text =
> > token->strFactory->newStr8(token->strFactory,
> (pANTLR3_UINT8)"<EOF>");
> >                 token->textState    = ANTLR3_TEXT_STRING;
> >                 return token->tokText.text;
> >             }
> >
> >
> >             // We had nothing installed in the token, create a new
string
> >             // from the input stream
> >             //
> >
> >             if  (token->input != NULL)
> >             {
> >
> > ////////////////////// The following code does a malloc/string copy
> > every time I call getText. //////////
> >                 return  token->input->substr(   token->input,
> >
> token->getStartIndex(token),
> >
token->getStopIndex(token)
> >                                             );
> >             }
> >
> >             // Nothing to return, there is no input stream
> >             //
> >             return NULL;
> >             break;
> >     }
> > }
> >
> >
> >
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> > email-address
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address



More information about the antlr-interest mailing list