[antlr-interest] Yet another idea for a completlygenericTreeParser

Mon May 16 13:22:53 PDT 2005

At 04:38 AM 5/16/2005, Scott Stanchfield wrote:
> > [Loring's analysis of ASTNode, ASTModel, and Code generator tree parser
>approaches]
>
>A fairly resonable analysis, though I don't agree with the comments on 4&5
>RE AST vs ASTModel or the final analysis. I do agree with the analysis on
>the generator idea. (It's an interesting idea, but I think too much
>impl/maintenance overhead)
>
>Generality/Capability of ASTNode vs ASTModel is almost the same. I've
>demonstrated that with the code example I posted. I challenge you to show me
>something that the AST node approach can do that the AST Model cannot.

Flexibility is the differentiator here.  With a per-node adaptor, it is 
easy to plug in/re-use an adaptor type to deal with a subtree type.  The 
single adaptor leads to "every tree composite has a unique adaptor".  Think 
of having an adaptor factory that wraps each node type with an adaptor 
specific to that node type.

>However, because ASTModel manages the entire tree, it has more context than
>a single ASTNode, which can increase its capability. For example, the
>ASTModel could use a hashmap of parent/child relationships to represent a
>tree, whereas the ASTNode approach cannot.

Not true--you can always delegate adaptor functionality to a singleton 
state object.  I can't see a case where you would want to, though--the 
intelligence should be in the ANTLR grammar, not the adaptor.

>GC aside, performance of ASTNode vs ASTModel is exactly equivalent if the
>target data cannot be made to implement ASTNode or ASTModel. Both impose an
>extra method call. If the data can be made to implement ASTNode or ASTModel,
>there's no adapter needed, but depending on the actual data structure, the
>ASTModel may impose a single extra method call to get to the real data,
>whereas ASTNodes may be able to be the real data itself.

With a heterogeneous tree, your ASTModel will have an additional case 
statement.  Not a big performance impact, true, but the ASTNode approach 
will be slightly faster since each subtree has its own adaptor and no case 
statement.  The real performance differential is in the typical case where 
you don't need an adaptor, however.

>[Note that I feel unless you're parsing ANTLR-generated ASTs, the end data
>should definitely *not* implement ASTNode directly, as that ties the data to
>ANTLR. Adapters are the right approach for non-ANTLR generated trees, and
>thus both ASTNode and ASTModel have the same method-call overhead]
>
>
>When GC is considered, the ASTModel approach performs better because fewer
>objects == less frequent GC.
>
>The only other difference between ASTNode and ASTModel is memory usage.
>Stating 50 extra objects is incorrect. You're only considering "single run"
>scenarios.

No--the assumptions are that rules nest a max of about 10 levels deep and 
there are about 10 elements per rule worst case.  In practice, it is 
unlikely that you would keep all references in a rule; more likely, you 
would reuse adaptor reference variables and the total would be down around 
20.  Adaptors don't accumulate--instead, you have a small set of adaptors 
live at any one time.

>Again, consider a scenario like parsing eclipse ASTs or a user request on a
>web server. These are long-running processes which will invoke ANTLR
>hundreds, thousands, or millions of times while they're running.
>
>In such a case, 50 extra nodes each run becomes incredibly significant to
>the GC frequency. (For a tree such as an AST of a Java program being edited
>in Eclipse, this number would significantly higher, likely 300-1000 or more)
>
>Suppose I'm editing for an hour in eclipse and the AST is parsed around 10
>times per minute. That's 600 parses, x 50 objects, or 3000 objects to
>collect. Compare this to 600 objects (ASTModel). [Using a higher estimate of
>500 nodes, we're now at 30000 objects vs 600]

These are very small numbers for an hour's run--a fraction of a % of the 
total objects garbage collected.  Not to mention that 600 live nodes would 
be a sign of a pathological application.

>Note that the overhead of the ASTModel approach is constant; one object per
>parse. The overhead of the ASTNode approach grows with the size of the tree.

Not true.  It grows with the depth of the tree, not the size of the tree.

>In addition, the ASTModel approach easily allows the ASTNode approach as
>well. All you need is a "default" ASTModel that works in terms of ASTNodes.
>(Just like Swing's DefaultTreeModel uses TreeNodes.) Alternatively the code
>generator could have a switch to generate one vs the other, which makes the
>performance between ASTNode and ASTModel equal instead of one extra method
>call to use ASTModel to work with ASTNodes.

"It is possible to get the worst of both worlds if you really want to", eh?

>Hence Flexibility is clearly better with the ASTModel, as it allows a
>grammar writer to choose the style they desire.
>
>The optimal solution is clearly the ASTModel approach on all fronts except
>performance (only when compared to Prashant's proposal). Even then, the
>performance difference is a negligible method call.

It imposes an unnecessary penalty for routine use (i.e.:  it loses on 
performance grounds to both of the alternatives), adds an additional 
element of complexity to ANTLR, and the benefits are minimal at best (and I 
think that that is being generous).  Clear signs of a creeping featuritis 
request.  Good design is often about streamlining and rarely about adding 
barnacles (politics can make adding barnacles a part of system design that 
adds perceived value to sponsors--that can be "good" design practice in 
some cases).

As I've pointed out, it is possible to do everything that you originally 
requested with ANTLR 2 and per-live-node adaptors.  If you have problems 
that require ANTLR to work with non-ANTLR trees, why haven't you 
experimented with the per-node adaptor model to see if you can solve them, 
yet?

--Loring

>My 50c,
>-- Scott