[antlr-interest] Progressive Slowdown in Parsing (code change proposal)

A. Saake asaake at hotmail.de
Sat Dec 27 11:50:57 PST 2008


After changing my grammar to avoid backtracking there was no performance 
gain. I didn't want to change everything to an AST output, because my actual 
grammar works fine and I didn't want to repeat all tests I have done so far.

So I went back to the profiler and had a look to the C# sources from 
TokenRewriteStream. My problem was the code in 
ReduceToSingleOperationPerIndex and GetKindOfOps.

I have to confess that I didn't try to understand the full 
TokenRewriteStream mechanism, but only the mentioned procedures. After some 
tests with my grammar I found a solution, which seems to result in nearly 
linear performance with my grammar. I'm back at 1 minute (was 18 minutes 
before) for my 3500 lines of code!

The change was to start with two empty arraylists in 
ReduceToSingleOperationPerIndex, one for InsertBeforeOps, one for RelaceOps. 
Then follow the loops as in original code. But in this loops the two 
arraylists grow with every processed rewrite, so no need for calling 
GetKindOfOps exists. The purpose of this loops is to eliminate rewrites by 
setting them to null. This has to be done in the new arraylists, too. And 
the arraylists were traversed backwards, what is a difference to the 
original code.

The only part I'm having a bad feeling about is the "continue"-statement 
after "rewrites[i] = null;", probably because I don't understand the whole 
thing. But it was part of the original code, so I didn't change it. Perhaps 
some one with more insight can verify that this is intended behaviour.

Here is my code:

    protected IDictionary ReduceToSingleOperationPerIndex(IList rewrites) {
      ArrayList insertBeforeOps = new ArrayList();
      ArrayList replaceOps = new ArrayList();

      // WALK REPLACES
      for (int i = 0, ni = rewrites.Count; i < ni; i++) {
        object op = rewrites[i];

        //ignored null'd ops
        if (op == null) continue;

        //InsertBeforeOp added to ArrayList, but ignored
        if (op is InsertBeforeOp) {
          insertBeforeOps.Add(op);
          continue;
        }

        //process only, when ReplaceOp
        ReplaceOp rop = op as ReplaceOp;
        if (rop == null) continue;

        //GetKindOfOps(rewrites, insertBeforeOps, replaceOps, lastVisited, 
i);

        // Wipe prior inserts within range
        for (int j = insertBeforeOps.Count -1; j >= 0; j--) {
          InsertBeforeOp iop = (InsertBeforeOp)insertBeforeOps[j];
          if (iop.index >= rop.index && iop.index <= rop.lastIndex) {
            // delete insert as it's a no-op.
            rewrites[iop.instructionIndex] = null;
            insertBeforeOps.RemoveAt(j);
          }
        }

        // Drop any prior replaces contained within
        for (int j = replaceOps.Count-1; j >= 0; j--) {
          ReplaceOp prevRop = (ReplaceOp)replaceOps[j];
          if (prevRop.index >= rop.index && prevRop.lastIndex <= 
rop.lastIndex) {
            // delete replace as it's a no-op.
            rewrites[prevRop.instructionIndex] = null;
            replaceOps.RemoveAt(j);
            continue;
          }

          // throw exception unless disjoint or identical
          bool disjoint =
            prevRop.lastIndex < rop.index || prevRop.index > rop.lastIndex;
          bool same =
            prevRop.index == rop.index && prevRop.lastIndex == 
rop.lastIndex;
          if (!disjoint && !same) {
            throw new ArgumentOutOfRangeException("replace op boundaries of 
" + rop +
                               " overlap with previous " + prevRop);
          }
        }
        //we are sure, it's a ReplaceOp, so add it to replaceOps list
        replaceOps.Add(rop);
      }

      //Reset lists
      insertBeforeOps = new ArrayList();
      replaceOps = new ArrayList();

      // WALK INSERTS
      for (int i = 0, ni = rewrites.Count; i < ni; i++) {
        object op = rewrites[i];

        //ignored null'd ops
        if (op == null) continue;

        //ReplaceOp added to ArrayList, but ignored
        if (op is ReplaceOp) {
          replaceOps.Add(op);
          continue;
        }

        //process only, when InsertBeforeOp
        InsertBeforeOp iop = rewrites[i] as InsertBeforeOp;
        if (iop == null) continue;

        //GetKindOfOps(rewrites, insertBeforeOps, replaceOps, lastVisited, 
i);

        for (int j = insertBeforeOps.Count - 1; j >= 0; j--) {
          InsertBeforeOp prevIop = (InsertBeforeOp)insertBeforeOps[j];
          if (prevIop.index == iop.index) { // combine objects
            // convert to strings...we're in process of toString'ing
            // whole token buffer so no lazy eval issue with any templates
            iop.text = CatOpText(iop.text, prevIop.text);
            // delete redundant prior insert
            rewrites[prevIop.instructionIndex] = null;
            insertBeforeOps.RemoveAt(j);
          }
        }

        //foreach (ReplaceOp rop in replaceOps) {
        for (int j = replaceOps.Count - 1; j >= 0; j--) {
          ReplaceOp rop = (ReplaceOp)replaceOps[j];
          if (iop.index == rop.index) {
            rop.text = CatOpText(iop.text, rop.text);
            rewrites[i] = null;  // delete current insert
            continue;
          }
          if (iop.index >= rop.index && iop.index <= rop.lastIndex) {
            throw new ArgumentOutOfRangeException("insert op " + iop +
                               " within boundaries of previous " + rop);
          }
        }
        if (rewrites[i] != null) {
          insertBeforeOps.Add(iop);
        }
      }

      // System.out.println("rewrites after="+rewrites);
      IDictionary m = new Hashtable();
      for (int i = 0; i < rewrites.Count; i++) {
        RewriteOperation op = (RewriteOperation)rewrites[i];
        if (op == null)
          continue; // ignore deleted ops
        if (m[op.index] != null) {
          throw new Exception("should only be one op per index");
        }
        m[op.index] = op;
      }
      //System.out.println("index to op: "+m);
      return m;
    }


I'd be happy if it would be part of the code base, if it's acceptable and I 
haven't overseen something important.

Best regards,

Andreas Saake

 



More information about the antlr-interest mailing list