[antlr-interest] Antlr 3 and the newline token problem

Sat Nov 26 05:29:36 PST 2005

> > The latter really should be handled by the tool as it can have a 
> > serious impact on lexer performance if done inefficiently.
> 
> Maybe I'm lacking imagination, but in what parsing task do 
> you actually expect that counting lines/offsets is a 
> significant performance bottleneck?

In all tasks that need lines/col/offset counting and care about performace
actually. My point is that how it is implemented can have a serious impact
on performance. At the lexer level, you would want to reduce the number of
times each character is "processed" or "touched" and the duration of the
processing to the absolute minimum. Short of editing generated code, I can't
see how that can be achieved optimally if the generator isn't generating the
tracking code as part of the lexer's core.

> I really can't imagine 
> anything complex enough to justify using something like ANTLR 
> but so trivial that line counting might give significant 
> performance differences.

I can. Attempting to tack on line/col/offset counting to a lexer via virtual
method overrides for instance.

> About the virtual methods: maybe some C++ crack might be able 
> to change the ANTLR C++ code creation and move lots of the 
> stuff now handled by virtual functions into some template 
> magic. That would of course ruin compatibility with old 
> compilers, so probably both modes should be available.

My comments also extend to the Java/C# targets. Virtual methods cost more
than non-virtual methods. This probably isn't a worry in the general case...

Micheal