[antlr-interest] Recovering white space in V3.0
atripp at comcast.net
Mon Jun 13 19:53:14 PDT 2005
>On Jun 11, 2005, at 7:58 AM, Andy Tripp wrote:
>> I'm currently testing my Jazillian translator
>> on gcc's libc. It's about 800,000 lines, and I keep all that as
>> token streams in memory.
>For all files all at once or one file at a time?
All at once. Plus a few symbol tables.
>> not a pretty sight, and I'm off to buy more memory because my 1GB
>> is no longer enough :( I'll be doing lots
>> of memory profiling - I'm sure it's my fault, not yours :)
>Hmm...i wonder if it's my fault!
>> ...speaking of things being your fault...
>> I spent the past week doing CPU profiling. One bottleneck
>> for me was that makeToken() uses reflection (calling newInstance()) -
>> I now have a setTokenFactory() method so that I provide my own
>> makeToken() method.
>I wondered how slow the reflection was...good to know. I'm avoiding
I think the setTokenFactory() way is cleaner anyway.
>> And now after fixing my own bottlenecks, nextToken() and LA() are
>> right near the top in my list of CPU hogs :)
>Well, the lexer is slow, whence, the nextToken speed. LA is called a
>HUGE amount in 2.x, repeatedly even in the same decision. 3.0
>decisions are optimal in that they call input.LT(i) for token i at
>most once during a single decision.
Nice. Not only am I lexing the input, but I have hundreds of
pattern-matching-and-replacement rules, and each time I do a
replacement text must be lexed.
>> That's when I know
>> I must be done, when I can say "I've done about all I can do, and
>> the rest must be Terence's fault" ;)
>Sounds like a South Park episode..."Blame Terence!" Though, they
>misspell "terrence and phillip".
>> Obviously, I'm just kidding, and I love ANTLR, even if I don't
>> believe in treewalkers (or even believe in AST-generating
>> parsers much, now that I think about it - guess I'm a lexer man).
>:) "To thine own self be true!" :)
Keep up the good work.
Andy - the perpetual ANTLR newbie ;)
More information about the antlr-interest