[antlr-interest] Hand made lexer 600(!!!!!) times faster of ANTLR'sone

Thu Feb 24 02:16:25 PST 2005

Hi Ruslan,

> Since again on this list raised performance issue
> I want to point our recent discovery.
> 
> We have made simplest ANTLR lexer in 5 lines to parse
> COMA/TAB separated files.
> We use C++.
> 
> Making tests. 2 minutes for file on import.
> We see that this is incredibly long.
> 
> 
> In one day our developer make hand lexer,
> And time is wow -- 0.2 seconds.
> 
> 600 times guys!     600 times!!!

That sucks. :-(

Is the grammar and input file available?. It would be useful to try out the
Java and C# code too.

If it's possible, please send me the hand-coded lexer, I'd like to try out a
C# version of it against what ANTLR/C# generates.

> This is incredible.
> This is very disappointed.
> 
> If to look into ANTLR then of course we see that xxxxxx
> 
> * LA() calls 
> 
> * creation of TOKENs that copy strings using std::string class
>         
>     in our version we simple return pointer to start and pointer
>     to end of token in the PARSED text.
>     - We do not copy any byte
>     - we do not call any new operator
>     - we do not create std::string objects
>     - we do not destroy them
>     - we do not call delete operators.
>     
>     instead of all above we have ZERO time.
> 
> * and probably exceptions...

Couldn't agree more.

> I do not understand why in ANTLR C++ cannot be written small 
> string class in 100 lines of code which will increase speed 
> of ANTLR in 10 times AT LEAST.

My apologies if you've already done this but, donating code to fix the
problem (or to illustrate how it might be fixed) is very welcome. As ANTLR3
matures, it can be molded to make it easier to incorporate these ideas on
improving performance.

Cheers,

Micheal