[antlr-interest] Any plans of next ANTLR Release

Sat May 1 12:00:27 PDT 2010

I agree with your conjecture - if you look at the performance of the C runtime as it stands, I have optimized away almost everything but the LA() call. The only way I get that to go away from the list of hot spots is to inline the code, which makes it faster by eliminating the indirect function call, but merely elides the information telling you that the performance is all about traversing the input stream, which at some point causes L1 cache misses and the CPU stalls. 

With the new VM approach, the memory fetch loop is going to be even more important and I suspect that LA being 60% of the total execution time may well turn out to be an underestimate. At some point though, we have to read the characters, and once they are in L1 cache it doesn't matter how often; that first cache miss is the killer. So optimizing the DFA is cool but as soon as we get a cache miss, it all goes to hell in a hand basket ;-)

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Ron Burk
> Sent: Friday, April 30, 2010 11:21 PM
> To: Terence Parr
> Cc: antlr-interest Interest
> Subject: Re: [antlr-interest] Any plans of next ANTLR Release
> 
> > According to Cox, Thompson's original algorithm generated
> > machine code in the 1960s.
> 
> It was a fair portion of the beauty of his paper that he could
> present mostly complete code in such a short space. However,
> the machine architecture of the machine he was working
> on fit this algorithm very well (and Thompson was clever
> in coding it). I would not be as blithe as Cox in assuming that
> modern machines are fast enough to forgo that advantage
> (YAMV - Your App May Vary). Lexical analysis so easily
> can consume a nontrivial portion of parsing CPU cost.
> E.g., in  http://tinyurl.com/yebzy5o is claimed:
> 
>   "From Intel XML Parsing Accelerator, we found that character
> checking loop occupies more than 60% CPU cycles of the whole parsing
> process, depending on the property of benchmark."
> 
> Of course, Thompson's algorithm (and Cox's analysis)
> is oriented towards building lots of NFA's on the fly
> for short-term use. Lexical analyzer generators have
> the luxury of enormous CPU time to spit out optimized
> (if not optimal -- though what one might be optimizing
> for can vary) DFAs.
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address