[antlr-interest] greedy vs nongreedy lexer rules

Mon Apr 19 13:01:57 PDT 2010

On Mon, Apr 19, 2010 at 6:32 AM, Ron Burk <ronburk at gmail.com> wrote:
>>  Multi-core systems are the norm now.  In my job, we spend a LOT of time
>> determining how best to extract maximum work in minimum time, and parallel
>> programming is a big part of that.
>
> Splitting the work of lexical analysis across cores without ending up
> slower than a simple, single-threaded state machine (due to constant
> stalling on shared data)... should be a challenge. It sure is easy to
> make multi-threaded solutions that are slower than single-threaded,
> especially if the single-threaded solution fits in L1 cache.
>
> IME, the ultimate in lexing speed requires tokenizing to proceed
> independently. A function call per token (in C/C++ anyway, using
> a deterministic state-driven lexer) tends to swamp performance gains
> from having the lexer proceed in batches; the parser needs to be just
> advancing a pointer to get to the next token, not calling anybody.
> But that's for ultimate speed.
>

Out of curiosity; may i ask you why "function call per token" is
something bad for multi-core performance? I am asking because the way
i see this issue is that pursuing two different paths in NFA has no
inherent coupling. It is simply matter of state replication and
merging it back to single outcome state upon entering some barrier
marking "end of work" state. Am I missing something?

-- 
Greetings
Marcin Rzeźnicki