[antlr-interest] [C] my v3 Parser no reuse() slower 20% than v2. With reuse() 2GB leaks, oops.

Ruslan Zasukhin ruslan_zasukhin at valentina-db.com
Wed Nov 16 08:35:44 PST 2011


On 11/16/11 6:00 PM, "Jim Idle" <jimi at temporal-wave.com> wrote:

> [C] my v3 Parser no reuse() slower 20% than v2. With reuse() 2GB leaks,
> oops.
> 
> Do not use the $text annotations if you want performance, they are purely
> for convenience ­ I must have said this 5000 times and I wish I had never
> added that bit ;) I also told you 3 or 4 times in various emails not to use
> it. I think that that is in the API docs somewhere, but I should make sure
> that it is, if it is not.

Right you told ...

But in docs, ANTLR books, examples, everywhere present this

    hex_string_literal

    :    s = HEX_NUMBER  -> CONST_STR_HEX[$s.text->chars]

Yes, I have checked C API docs even today, but have found any special page,
which says

    Java guys do this
    C guys do this.


> There is no memory leak, but the auto string stuff does not release until
> you free the string factory, which only happens when you free the parser,
> not when you reuse it. Because it allocates small strings all the time, it
> kills performance, and then you will page.

Clear.

So when I "fix" all places with .text usage problem with memory should
disappear self.


> xxx: s=HEX_NUMBER { $s.type = CONST_STR_HEX; } ;

> I think that the field name is type but you get the idea.

Yes, I will try this asap and give feedback.
I have 40 such places in parser. And some number in the tree parser.


>  Don¹t use the
> fake object oriented stuff when you want performance, use the structs
> directly ­ you will find that it is many times faster than the v2 C++, not
> slower ­ this is C and you should get as close to the metal as you can.

I very hope :-)

If with PARSER I think I see how I can use this $s.type
I will check right now other 39 places in parser :)

=====================================
It is not clear to me what we can do with Tree Parser ??

So I have some token, e.g. Date or time or other literal.
I make label, now I need get TEXT.

general_literal returns [ENode_Const_Ptr res]

    : cd=CONST_DATE
            { res=make_enode_date ( GET_FBL_STRING($cd.text) );  }



So far I have found, that I can do something as

general_literal returns [ENode_Const_Ptr res]

    : cd=CONST_DATE
      {
              pANTLR3_COMMON_TOKEN pToken = $cd->getToken( $cd );
              ANTLR3_MARKER pStart = pToken ->getStartIndex( pToken );
              ANTLR3_MARKER pEnd  = pToken->getStopIndex( pToken );
             .... Do some job ...
      }


Does such code in TreeParser looks correct for you?

Is it really safe and  getStartIndex / getStopIndex always return us correct
pointers?

Of course this can be extracted into special func to be used in many places
in one line of code ...

Just I believe there is no any example in C and any docs pages which discuss
this for TreeParser and C. If exists please point me by finger :-)


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]




More information about the antlr-interest mailing list