[antlr-interest] greedy vs nongreedy lexer rules

Sun Apr 18 21:32:22 PDT 2010

>  Multi-core systems are the norm now.  In my job, we spend a LOT of time
> determining how best to extract maximum work in minimum time, and parallel
> programming is a big part of that.

Splitting the work of lexical analysis across cores without ending up
slower than a simple, single-threaded state machine (due to constant
stalling on shared data)... should be a challenge. It sure is easy to
make multi-threaded solutions that are slower than single-threaded,
especially if the single-threaded solution fits in L1 cache.

IME, the ultimate in lexing speed requires tokenizing to proceed
independently. A function call per token (in C/C++ anyway, using
a deterministic state-driven lexer) tends to swamp performance gains
from having the lexer proceed in batches; the parser needs to be just
advancing a pointer to get to the next token, not calling anybody.
But that's for ultimate speed.