[antlr-interest] Recovering white space in V3.0

Sat Jun 11 07:58:33 PDT 2005

>> In other words, have CommonTokenStream do buffering and then maybe  
>> provide a
>> alternative LeanTokenStream that doesn't. But don't just provide  
>> LeanTokenStream,
>> because then people will have to write their own buffering code.
>
>Exactly my plan, Andy! :)
>
>As I noted privately this morning to Bryan Ewbank, I parsed 90,000  
>line C++ header files with my 90Mhz 64M RAM NeXT box 10 years ago  
>with no ill effect (PCCTS buffered it all up to do syntactic  
>predicates).  I estimate for Bryan's 100,000 line files, you might  
>consume 30M in Java to buffer all text and all tokens.

Terence,
I'm currently testing my Jazillian translator
on gcc's libc. It's about 800,000 lines, and I 
keep all that as token streams in memory. It's
not a pretty sight, and I'm off to buy more memory 
because my 1GB is no longer enough :( I'll be doing lots
of memory profiling - I'm sure it's my fault, not yours :)

...speaking of things being your fault...
I spent the past week doing CPU profiling. One bottleneck
for me was that makeToken() uses reflection (calling newInstance()) -
I now have a setTokenFactory() method so that I provide my own
makeToken() method. 

And now after fixing my own bottlenecks, nextToken() and LA() are
right near the top in my list of CPU hogs :) That's when I know
I must be done, when I can say "I've done about all I can do, and
the rest must be Terence's fault" ;) 

Obviously, I'm just kidding, and I love ANTLR, even if I don't
believe in treewalkers (or even believe in AST-generating
parsers much, now that I think about it - guess I'm a lexer man).

Andy