[antlr-interest] Antlr 3 and the newline token problem
Micheal J
open.zone at virgin.net
Mon Nov 28 11:48:13 PST 2005
> > In any case you've omitted the per-character call for col/offset
> > tracking. We were discussing line/col/offset counting not just
> > newlines.
>
> Well, the offset gets tracked anyway, as ANTLR is going
> through a String where it has to track the input position
> anyways. That value is IIRC also accessible (or could be made
> accessible very easily).
Assuming the implementation buffers the whole input in a string/char[]. Not
optimal for larger files (especially if parser is building ASTs and
"project" contains multiple files).
> What is left is line breaks. How would you imagine ANTLR
> Lexers do that more efficiently? E.g. always checking if the
> next character(s) is a \r \n, \n or \r? What about users that
> want \0 to be their line separator? Or users that don't want
> that at all?
Special syntax to inform ANTLR of what constitues a "newline"?
> > If the lexer was built to do it properly, there would be no
> function
> > calls at all.
>
> The overhead of a function call on x86 is very low. Plus,
> your compiler might decide to inline, at least in managed
> languages, as said. For C++ a no-virtual-method-needed way
> via templates has been discussed.
Either way, the lexer core either supports it or performance degrades
(probably not as much for C/C++).
> The only thing that is (currently) done using an overridden
> method is the newline thing, isn't it? A per character
> virtual method call would be ugly, that's true.
>
> Are you using the Lexer standalone? Even in that case I'd
> wonder if it really makes a difference. For each character
> you have at least one switch, you have the testing of
> alternatives etc. Will a virtual method call for every ~20
> characters make a difference bigger than maybe 1%?
For tracking newlines only, it will be on par with ANTLR v2 lexers I guess.
> I'm not generally arguing against including something like
> that, but you'd have to find a very flexible way to do so.
> Otherwise users will be unhappy because it doesn't match what
> they want to have, and their solution might get more complex.
Would some syntax that instructs ANTLR on what constitutes a newline do?
Micheal
More information about the antlr-interest
mailing list