[antlr-interest] Antlr lexer vs. flex lexer speed (C++)

Alex Sedow alexsedow at mail.ru
Mon Jun 28 10:45:47 PDT 2004


> On Fri, Jun 25, 2004 at 09:03:55PM +0400, Alex Sedow wrote:
> > 2. Whether ANTLR 3.x will use std::string, or will replace it with
special
> > string class?
>
> I'll investigate making the string class a template parameter to
> parsers/lexers this to ensure the right class for the job can be used
(e.g.
> one for unicode or something simpler where needed)
>
> Cheers,
>
> Ric

1. Maybe just use typedef?

2. String or pointer to string? Smart pointer to string faster. For example
std::string (MSVC, 32-bit) for char defined as:

class std::string
{
    Allocator *pAllocator;

    enum
    { // length of internal buffer
        _BUF_SIZE = 16
    };

    union _Bxty
    { // storage for small buffer or pointer to larger one
        _Elem _Buf[_BUF_SIZE];
        _Elem *_Ptr;
    } _Bx;

    size_type _Mysize; // current length of string
    size_type _Myres; // current storage reserved for string
};

Totally 28 bytes. It is a big memory usage penalty for lexer. Let's consider
a penalties which arise at transfer of a string as function argument:
1. When std::string transfers as pointer.
a) minimal transfer penalty (4 bytes).
b) additional penalty for access to string (2 pointers access): read pointer
to char * from std::string + access with pointer.
2. When std::string transfers as value.
a) big transfer penalty (28 bytes).
b) minimal access penalty(?).

To reduce these penalties for lexer I use special class "Pointer to const
hash string".
//--------------------------------------------------------------------------
----
// hash string in memory:
//
// bytes:
// 4 - reference count <- pointer
// 4 - unsigned hash code (evaluates one time for identifiers when string
created)
// 4 - [string length (not aligned)]
// N - ASCIIZ string
// NA-N - ( NA = aligned ( N ) ) - not used
// where NA is power of 2 big enough to contain
(4+4+4+sizeof(string+trailing '\0'))
//--------------------------------------------------------------------------
----
To reduce increasing/decreasing reference counter penalty I use another
(parallel) pointer to string (to transfer string to functions that could not
potentialy store string. For example bool isExist(key) const function in
hash table not store value).

Alex.

P.S. If somebody knows how to speed up the given variant of a string, please
let me know.



 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/
 



More information about the antlr-interest mailing list