[antlr-interest] LL(1), LL(k), LL(*), predicated decisions

Terence Parr parrt at cs.usfca.edu
Mon Jun 5 14:10:24 PDT 2006


Howdy,

As I rebuild the decisions for cyclic DFA, some simplifications  
present themselves.  These new DFA I'm generating are capable of  
handling all decisions, even the ones with predicates.  I wacked a  
few hundred lines of CyclicDFACodeGenerator.java and think that we  
should consider doing the same for ACyclicDFACodeGenerator that  
handles LL(k) decisions.  Right now there are 2 parallel mechanisms  
in the codegen logic and generated code for LL(k) and LL(*) decisions.

My argument would be that having a single codegen strategy is more  
robust, easier to understand, easier to port, and smaller.  The only  
concern would be that it might be a bit slower to do LL(1) decisions  
this way as I'd need a method call to get to the DFA simulation  
code.  OTOH, switches almost always amount to a binary search rather  
than the quick jump via a transition table.

Actually, their is another issue related to (..)? and (..)* blocks.   
For (..)? example, you don't want the DFA testing the lookahead for  
the bypass state.  It delays error detection but is more what people  
expect.  For (X)? people expect "if ( LA(1)==X ) match(X);" rather  
than having an else clause that tests to see if the lookahead is  
consistent with what follows (X)?  Hmmm...seems like I could just  
avoid generating an error msg for optional constructs.  I'll bet that  
would work.  cool.

ok.  Assume we can use the same mechanism for all decisions.  It  
still seems "wrong" to gen a separate DFA for (COMMA ID)*.  We would  
want:

while ( LA(1)== COMMA ) {
   match(COMMA);
   match(ID);
}

Perhaps I should special case only the blocks that are LL(1) and have  
a single alternative.

As for speed, it looks like a method call is not that bad.  Running  
the java parser on jdk 1.4.2 source, yields 4,762,276 decisions which  
would mean that many method calls.  Running a spin loop around a  
method call that is probably not inlined:

         public int count(int f) {
                 return f+1;
         }

yields the following speed in compiled mode:

/tmp $ java SpinCall 4762276
25ms

Naturally, this may perform differently in the actual java parser so  
I added an extra method call to this bogus method:

     int c = 0;
     int n = 0;
     public int foo(IntStream input) {
         return c++;
     }

to *every* decision in the parser.  Parsing/lexing went from 3082ms/ 
1341 to 3145/1400.  That is 3%/5% increase.  So, not that much cost  
for a big improvement in regularity.

Anybody want to comment?

Ter


More information about the antlr-interest mailing list