[antlr-interest] v3 lexing time

Sat May 27 11:36:27 PDT 2006

[antlr-interest: on antlr-dev we've been discussing parsing/tree  
building speed for v3 vs v2 on java 1.4.2 source]

Hi, ok, i measured lexer time now and it looks like about 1600ms to  
do lexing regardless of tree construction, which is consistent.   
Parsing+lexing is like 6200ms and building trees+parse+lex is about  
10000ms.  I think this implies that doing 1.4.2 java source costs:

lexing	1600ms
parsing	6200-1600 = 4600ms
tree build 10000 - 6200 = 3800ms

Lexing seems to be about 1600 / 6200 = .258 or 26% of total parse  
time.  Trees are pretty expensive at 3.8s.

Now, here is the interesting piece of data.  v2 is mostly lexer!  The  
lexing time is 6950ms out of 9800ms for lexing/parsing/tree  
building.  That seems amazing to me...perhaps I'm measuring the lexer  
wrong...hard to figure out that v2 crap with all of its convoluted  
input stuff.

Here is how I measure v2 lexing speed:

             JavaRecognizer parser = new JavaRecognizer(lexer);
             parser.setFilename(f);
             ParserSharedInputState inputState = parser.getInputState();
             TokenBuffer tokens = inputState.getInput();
             long start = System.currentTimeMillis();
             Token t = null;
             int i = 1;
             do {
                 t = tokens.LT(i);
                 i++;
             } while ( t.getType()!=Token.EOF_TYPE );
             long stop = System.currentTimeMillis();
             lexerTime += stop-start;

Does that look close enough?

Verrrrry interesting.  Except for the lexer, v2 parsers/tree seem  
oddly very fast.

Ok, I'm going to do the DFA stuff now with arrays not classes and see  
what difference that makes.  I do note that v3 lexing is already 4x  
faster than v2 lexing!  The parser has very heavy DFA prediction  
though...  I need to optimize the LL(1) case.

Ter