[antlr-interest] Antlr 3 and the newline token problem

Sat Nov 26 14:45:30 PST 2005

The difference is *few 100 microseconds* for even 10000 virtual function  
calls. See for yourself in this benchmark (makes 1 billion calls). Run the  
jvm both with and without "-server" option.

Attachments: Calls.java and Calls2.java

On Sat, 26 Nov 2005 22:22:36 +0530, Martin Probst <mail at martin-probst.com>  
wrote:

> Hi,
>
>> In any case you've omitted the per-character call for col/offset  
>> tracking.
>> We were discussing line/col/offset counting not just newlines.
>
> Well, the offset gets tracked anyway, as ANTLR is going through a String
> where it has to track the input position anyways. That value is IIRC
> also accessible (or could be made accessible very easily).
>
> What is left is line breaks. How would you imagine ANTLR Lexers do that
> more efficiently? E.g. always checking if the next character(s) is a \r
> \n, \n or \r? What about users that want \0 to be their line separator?
> Or users that don't want that at all?
>
>> If the lexer was built to do it properly, there would be no function  
>> calls
>> at all.
>
> The overhead of a function call on x86 is very low. Plus, your compiler
> might decide to inline, at least in managed languages, as said. For C++
> a no-virtual-method-needed way via templates has been discussed.
>
>> > I don't know what you're
>> > doing with the 4000 lines you have parsed in the same time,
>> > but are 4000 de-refs really significant compared to stepping
>> > through the parsing rules for 4000 lines of code and building the AST?
>>
>> Lexers don't build ASTs. The per-char calls needed for line/col/offset
>> tracking would definitely hurt lexer performance if the counts were  
>> tacked
>> on via overridden methods.
>
> The only thing that is (currently) done using an overridden method is
> the newline thing, isn't it? A per character virtual method call would
> be ugly, that's true.
>
> Are you using the Lexer standalone? Even in that case I'd wonder if it
> really makes a difference. For each character you have at least one
> switch, you have the testing of alternatives etc. Will a virtual method
> call for every ~20 characters make a difference bigger than maybe 1%? I
> think there are more important places where ANTLR could - and is - be
> enhanced, e.g. the String copying thing or various things in the C++
> part that have been discussed countless times on this list.
>
> I'm not generally arguing against including something like that, but
> you'd have to find a very flexible way to do so. Otherwise users will be
> unhappy because it doesn't match what they want to have, and their
> solution might get more complex.
>
> Martin
>

-- 
The problem with the world is that wise are never sure and fools are ever  
so damn confident
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Calls.java
Type: application/octet-stream
Size: 1129 bytes
Desc: not available
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20051127/4ea737d5/Calls.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Calls2.java
Type: application/octet-stream
Size: 988 bytes
Desc: not available
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20051127/4ea737d5/Calls2.obj