[antlr-interest] Yet another idea for a completlygenericTreeParser

Mon May 16 04:38:02 PDT 2005

> [Loring's analysis of ASTNode, ASTModel, and Code generator tree parser
approaches]

A fairly resonable analysis, though I don't agree with the comments on 4&5
RE AST vs ASTModel or the final analysis. I do agree with the analysis on
the generator idea. (It's an interesting idea, but I think too much
impl/maintenance overhead)

Generality/Capability of ASTNode vs ASTModel is almost the same. I've
demonstrated that with the code example I posted. I challenge you to show me
something that the AST node approach can do that the AST Model cannot.

However, because ASTModel manages the entire tree, it has more context than
a single ASTNode, which can increase its capability. For example, the
ASTModel could use a hashmap of parent/child relationships to represent a
tree, whereas the ASTNode approach cannot.

GC aside, performance of ASTNode vs ASTModel is exactly equivalent if the
target data cannot be made to implement ASTNode or ASTModel. Both impose an
extra method call. If the data can be made to implement ASTNode or ASTModel,
there's no adapter needed, but depending on the actual data structure, the
ASTModel may impose a single extra method call to get to the real data,
whereas ASTNodes may be able to be the real data itself.

[Note that I feel unless you're parsing ANTLR-generated ASTs, the end data
should definitely *not* implement ASTNode directly, as that ties the data to
ANTLR. Adapters are the right approach for non-ANTLR generated trees, and
thus both ASTNode and ASTModel have the same method-call overhead]

When GC is considered, the ASTModel approach performs better because fewer
objects == less frequent GC.

The only other difference between ASTNode and ASTModel is memory usage.
Stating 50 extra objects is incorrect. You're only considering "single run"
scenarios.

Again, consider a scenario like parsing eclipse ASTs or a user request on a
web server. These are long-running processes which will invoke ANTLR
hundreds, thousands, or millions of times while they're running.

In such a case, 50 extra nodes each run becomes incredibly significant to
the GC frequency. (For a tree such as an AST of a Java program being edited
in Eclipse, this number would significantly higher, likely 300-1000 or more)

Suppose I'm editing for an hour in eclipse and the AST is parsed around 10
times per minute. That's 600 parses, x 50 objects, or 3000 objects to
collect. Compare this to 600 objects (ASTModel). [Using a higher estimate of
500 nodes, we're now at 30000 objects vs 600]

Note that the overhead of the ASTModel approach is constant; one object per
parse. The overhead of the ASTNode approach grows with the size of the tree.

In addition, the ASTModel approach easily allows the ASTNode approach as
well. All you need is a "default" ASTModel that works in terms of ASTNodes.
(Just like Swing's DefaultTreeModel uses TreeNodes.) Alternatively the code
generator could have a switch to generate one vs the other, which makes the
performance between ASTNode and ASTModel equal instead of one extra method
call to use ASTModel to work with ASTNodes.

Hence Flexibility is clearly better with the ASTModel, as it allows a
grammar writer to choose the style they desire.

The optimal solution is clearly the ASTModel approach on all fronts except
performance (only when compared to Prashant's proposal). Even then, the
performance difference is a negligible method call.

My 50c,
-- Scott