[antlr-interest] Appropriate use of honey badger listeners

Kyle Ferrio kferrio at gmail.com
Thu Jan 12 16:35:20 PST 2012


Hold on a sec... I get that if the only way to rewrite a tree is to make a
new tree, then comparing the pointers of the root nodes is the equality
test.  But I was assuming that the "new" tree might actually reuse most of
the old tree and in particular the root node.  I'm pretty sure I missed an
imprtanr assumption in your question.

Re the role of antlr building trees v. Listeners later...Ah...I get it
now.

I've read your email twice.  I will read it again.  Sounds complicated.
Would Sam's idea (stack machine) help simplify this?

I like animals.  So does Honey Badger, just not the same way.

Kyle
 On Jan 12, 2012 4:39 PM, "Terence Parr" <parrt at cs.usfca.edu> wrote:

>
> On Jan 11, 2012, at 8:39 PM, Kyle Ferrio wrote:
>
> > Hi Ter,
> >
> > I don't have answers, but I have questions which might at least cut off
> some dead ends...
> >
> > I agree that point #2 seems pretty clear.  And I realize that this:
> >
> >     newtree = listener.someEvent(oldtree);
> >     if ( newtree!=oldtree ) replace-oldtree-with-newtree;
> >
> > is pseudo-code.  You probably have this all worked out, but it occurs to
> me that comparing two trees may be more expensive than necessary.
>
> I would be comparing pointers. if the pointers are different, I would slip
> in the new pointer. only the replace operation is pseudocode. In reality it
> would simply be "set the ith child pointer to the new pointer".
>
> >   Despite the fact that walking a tree of n nodes is O(n), and so
> comparing two trees (non-naively) is also O(n), there may be a lot of data
> in those nodes to compare.  If all we care about is detecting changes, a
> significant speedup is possible if the interface for a tree object (or for
> listeners acting on trees?) includes a flag that gets flipped by any
> operation which has the possibility of changing the tree.  (N.B.
> Possibility, not guarantee.  A guarantee would require the eval we're
> trying to avoid.)  This tiny bit of added API built into a Listener
> baseclass for processing tree changes (which are all covered by a small,
> finite set of generic operations) would turn the subsequent O(n) comparison
> into O(1).  I'm not sure how this fits with your design.  Of course, all
> bets are off if someone messes with the tree directly.  (N.B.
> Alternatively, a hash on trees would be almost as cheap and require no
> extra API, but that only works if we're willing to treat topologically
> equivalent trees with different memory layouts (e.g. flipping a tree
> left-to right) as different.)
>
> I don't think we need to do anything fancy. All we are doing is altering
> pointers in the tree. If you want to delete a subtree, return null. If you
> want to leaves the tree as it is, return the original pointer. If you want
> to alter a subtree, return a pointer to the new tree.
>
> > Ok, now point #1.  Not sure how to approach this, but...
> >
> > Thought: If the root of a rule r which returns a value does not declare
> the type of value expected, two questions arise. One, where/when does the
> antlr tool perform type-checking for the rules which consume rule r?
>
> ANTLR would not be involved in this case because all ANTLR does is create
> the parse trees. Our listener would be annotating those trees.
>
> > Two, where/when does the antlr tool perform type-checking of the retvals
> from listeners for different alts of r?
>
>  Again, I don't think ANTLR is involved here. the only way it would get
> involved as if we defined return values or parameters that get shoved into
> the context objects. And the context objects are the parse tree nodes.
>
> > Perhaps an explicit decoration/declaration of a type T at the root of r
> is the easiest, clearest and safest way to communicate to all of the
> listeners of all of the alts of r that they must return type T.  My
> thinking here is probably clouded by my preternaturally strong preference
> for static typing.
>
> I too am leaning toward some static typing, but I hesitate to put "returns
> [T v]" into the grammar because T it is application-specific. and, more
> importantly, it is pass specific. each pass might need a different type so
> we would need to do something like "returns [T v, U w]" but then the
> grammar is specifically tied to a particular application and literally the
> parser would not compile without having those types available. there would
> be no way to provide such a generic grammar to users on the ANTLR website.
>
> We need a way to decorate or annotate a node in a type safe way.  To do
> that in Java, we have to use of classes or interfaces or something.
> Certainly, we can provide a factory that creates the various contexts to
> use our subclasses of ParserRuleContext like T and U. The only problem is
> that the tree node types can only be sent once: on construction.
>
> What if we had some kind of adapter that fed the listeners proxies? So the
> original parse tree has generic nodes like ruleContext and we need T and U
> for 2 different phases. As the parse tree walker moved along, it could wrap
> the generic ruleContext objects with a proxy, of type T, that had the extra
> information we need. T with points at ruleContext instead of using
> inheritance.  If we didn't reuse the proxies, however, we would have to
> create a new object for every listener event. That's not the end of the
> world as I do that during parsing. But, we could think about reusing proxy
> objects, though I doubt we would know when the application was going to
> refer to multiple contexts. As long as the proxy object popped into
> existence for the listener event and popped out of existence afterwards,
> should not burden the garbage collector (which does not walk dead objects).
>
> let's see what this would look like. Here is a chunk of the parse tree
> walker:
>
> ParseTree.RuleNode r = (ParseTree.RuleNode)t;
> enterRule(listener, r);
> int n = r.getChildCount();
> for (int i = 0; i<n; i++) {
>    walk(listener, r.getChild(i));
> }
> exitRule(listener, r);
>
> And let's say we have an adapter that knows how to create wrappers for
> parse tree nodes:
>
> adaptor.wrap(t); // returns type T or U or whatever you want
>
> may be like this:
>
> class T {
>    ParserRuleContext delegate; // hide in a superclass somewhere
>    int v; // my field
>    public T(ParserRuleContext ctx) { delegate = ctx; }
> }
>
> class myAdaptor<T> implements ParseTreeWrapper {
>    public T wrap(ParserRuleContext t) { return new T(t); }
> }
>
> Then, all we have to do is change the enter event to look like this:
>
> enterRule(listener, adaptor.wrap(r));
>
> The listener event would have to cast the ParserRuleContext to T or U etc…
>
> public void exitRule(AParser.multContext ctx) {
>    T t = (T)ctx;
>    t.v = t.a.v * t.b.v;
> }
>
> MUCH cleaner and type safe.
>
> OTOH, we could reverse it by adding a pointer to every parse tree node
> that points to an object with extra fields. Only those parse trees that
> need extra fields would have a pointer to an auxiliary object or object
> extension if you will.
>
> Or, As Sam said, we could provide a factory for creating the various parse
> tree nodes. If our application needed object extensions, we could pass in a
> factory that created special context objects that had our fields. Those
> objects would have to have a union of all the values needed by all phases,
> but that's not necessarily a bad thing. It's often the case that I want my
> type computation information to persist through multiple phases all the way
> through to code generation. On the other hand, if we have lots of temporary
> values we need just for computation, the tree node would have lots of extra
> fields, mucking up our class. I suppose nothing beats the simplicity of a
> factory creating the right objects. Then we can create a class like this
>
> class T extends ParserRuleContext {
>    int v; // temp value
>    TypeInfo type; // persists across tree passes
> }
>
> The problem is that we would need a new class definition for every
> context, meaning every rule, in the grammar. yuck.  actually, maybe it is
> not too bad because we would only need to annotate some of the nodes such
> as eContext.  We would still need a typecast in the listeners like:  T t =
> (T)ctx;
>
> It'd be simpler to add a field to ParserRuleContext and had a generic type
> because it means creating only a single object:
>
> class ParserRuleContext<T> {
>    T extension;
>    … existing stuff ...
> }
>
> All of the various context object types could point at the single type of
> extension object.  I guess it would mean adding code in the listeners like
> this:
>
> public void exitRule(AParser.multContext<T> ctx) {
>    if ( ctx.extension==null ) ctx.extension = new T();
>    T t = ctx.extension;
>    T a = ctx.a.extension; // ugh; kind of a mouthful
>    T b = ctx.b.extension;
>    t.v = a.v * b.v;
> }
>
> Hmm…
>
> one thing I should note about the stack solution. I wonder if it might get
> confusing knowing what is on the stack and what is not, particularly if not
> every event has a listener implementation (that pushed a value). If the
> values are stored in the parse tree contexts, you know that that no does
> not have a value. With a stack, you would see a value computed way below
> that you don't necessarily want. You would not know that it is not the
> value you want because it is a disembodied value on the stack. That said,
> it's simplicity is appealing. If we need to annotate the tree anyway,
> however, maybe we do need a solution like I describe above.
>
> > Well, I'm not sure if any of this helps.  But I'm reasonably confident
> that it does not hurt.
>
> "no animals were hurt during the production of this e-mail" :)
>
> Ter
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>


More information about the antlr-interest mailing list