[antlr-interest] Recovering white space in V3.0

Mon Jun 13 11:28:31 PDT 2005

On Jun 11, 2005, at 7:58 AM, Andy Tripp wrote:
> Terence,
> I'm currently testing my Jazillian translator
> on gcc's libc. It's about 800,000 lines, and I keep all that as  
> token streams in memory.

Hi :)

For all files all at once or one file at a time?

> It's
> not a pretty sight, and I'm off to buy more memory because my 1GB  
> is no longer enough :( I'll be doing lots
> of memory profiling - I'm sure it's my fault, not yours :)

Hmm...i wonder if it's my fault!

> ...speaking of things being your fault...
> I spent the past week doing CPU profiling. One bottleneck
> for me was that makeToken() uses reflection (calling newInstance()) -
> I now have a setTokenFactory() method so that I provide my own
> makeToken() method.

I wondered how slow the reflection was...good to know.  I'm avoiding  
it 3.0

> And now after fixing my own bottlenecks, nextToken() and LA() are
> right near the top in my list of CPU hogs :)

Well, the lexer is slow, whence, the nextToken speed.  LA is called a  
HUGE amount in 2.x, repeatedly even in the same decision.  3.0  
decisions are optimal in that they call input.LT(i) for token i at  
most once during a single decision.

> That's when I know
> I must be done, when I can say "I've done about all I can do, and
> the rest must be Terence's fault" ;)

Sounds like a South Park episode..."Blame Terence!"  Though, they  
misspell "terrence and phillip".

> Obviously, I'm just kidding, and I love ANTLR, even if I don't

hooray!

> believe in treewalkers (or even believe in AST-generating
> parsers much, now that I think about it - guess I'm a lexer man).

:)  "To thine own self be true!" :)

Ter
--
CS Professor & Grad Director, University of San Francisco
Creator, ANTLR Parser Generator, http://www.antlr.org
Cofounder, http://www.jguru.com