[antlr-interest] Antlr 3 and the newline token problem

Mon Nov 28 11:48:13 PST 2005

> > In any case you've omitted the per-character call for col/offset 
> > tracking. We were discussing line/col/offset counting not just 
> > newlines.
> 
> Well, the offset gets tracked anyway, as ANTLR is going 
> through a String where it has to track the input position 
> anyways. That value is IIRC also accessible (or could be made 
> accessible very easily).

Assuming the implementation buffers the whole input in a string/char[]. Not
optimal for larger files (especially if parser is building ASTs and
"project" contains multiple files).

> What is left is line breaks. How would you imagine ANTLR 
> Lexers do that more efficiently? E.g. always checking if the 
> next character(s) is a \r \n, \n or \r? What about users that 
> want \0 to be their line separator? Or users that don't want 
> that at all?

Special syntax to inform ANTLR of what constitues a "newline"?

> > If the lexer was built to do it properly, there would be no 
> function 
> > calls at all.
> 
> The overhead of a function call on x86 is very low. Plus, 
> your compiler might decide to inline, at least in managed 
> languages, as said. For C++ a no-virtual-method-needed way 
> via templates has been discussed.

Either way, the lexer core either supports it or performance degrades
(probably not as much for C/C++).

> The only thing that is (currently) done using an overridden 
> method is the newline thing, isn't it? A per character 
> virtual method call would be ugly, that's true.
> 
> Are you using the Lexer standalone? Even in that case I'd 
> wonder if it really makes a difference. For each character 
> you have at least one switch, you have the testing of 
> alternatives etc. Will a virtual method call for every ~20 
> characters make a difference bigger than maybe 1%?

For tracking newlines only, it will be on par with ANTLR v2 lexers I guess.

> I'm not generally arguing against including something like 
> that, but you'd have to find a very flexible way to do so. 
> Otherwise users will be unhappy because it doesn't match what 
> they want to have, and their solution might get more complex.

Would some syntax that instructs ANTLR on what constitutes a newline do?

Micheal