[antlr-interest] Tokens that span across char streams

David-Sarah Hopwood david-sarah at jacaranda.org
Wed Aug 26 19:30:15 PDT 2009


Stanislav Sokorac wrote:
> I guess the tricky thing will be to insert this functionality without
> significantly adding to the run time.. If the stream has to check for
> macros, and also mux between the regular stream and the macro definition,
> I'm adding two 'if' checks on every single character. Maybe more if I'm also
> selectively updating character positions.

I wouldn't worry about that. There are already several dynamically
dispatched method calls per character.

> I could have the lexer signal to the stream when the switch is needed to
> remove one of those, at least.
> 
> Or am I over-optimizing here, is lexer already doing way more on every
> character than I'm talking about here? I am going to be running into some
> significantly large files, so I'd like to avoid overhead wherever I can...

For parsing large files, I would worry more about memory. ANTLR lexer/
parsers are quite memory-hungry, and typically almost nothing can be gc'd
until the input is completely parsed and the CharStream and TokenStream
objects have been discarded.

(FWIW, I think most programmers systematically overestimate the performance
effect of running additional code, and underestimate the effect of memory
usage. The latter can absolutely kill performance if it leads to swapping.)

-- 
David-Sarah Hopwood  ⚥  http://davidsarah.livejournal.com



More information about the antlr-interest mailing list