[antlr-interest] Appropriate use of honey badger listeners

Thu Jan 12 16:23:37 PST 2012

You could use a generic ContextAnnotator type, which can be created as
necessary to associate an arbitrary type with parse tree nodes in a way that
can be easily shared among multiple listeners:

class ContextAnnotator<T> {
    private final Map<ParserRuleContext<?>, T> data = new
IdentityHashMap<ParserRuleContext<?>, T>();
    public T getData(ParserRuleContext<?> context) { return
data.get(context); }
    public void setData(ParserRuleContext<?> context, T value) {
data.put(context, value); }
}

class SomeListener extends BlankMyParserListener {
    private final ContextAnnotator<ExtraInfo> extraInfo;

    public SomeListener(ContextAnnotator<ExtraInfo> extraInfo) {
        this.extraInfoAnnotator = extraInfoAnnotator;
    }
}

--
Sam Harwell
Owner, Lead Developer
http://tunnelvisionlabs.com

-----Original Message-----
From: Terence Parr [mailto:parrt at cs.usfca.edu] 
Sent: Thursday, January 12, 2012 5:40 PM
To: ANTLR Interest Mailing List
Subject: Re: [antlr-interest] Appropriate use of honey badger listeners

On Jan 11, 2012, at 8:39 PM, Kyle Ferrio wrote:

> Hi Ter,
> 
> I don't have answers, but I have questions which might at least cut off
some dead ends...
> 
> I agree that point #2 seems pretty clear.  And I realize that this:
> 
>     newtree = listener.someEvent(oldtree);
>     if ( newtree!=oldtree ) replace-oldtree-with-newtree;
> 
> is pseudo-code.  You probably have this all worked out, but it occurs to
me that comparing two trees may be more expensive than necessary.

I would be comparing pointers. if the pointers are different, I would slip
in the new pointer. only the replace operation is pseudocode. In reality it
would simply be "set the ith child pointer to the new pointer".

>   Despite the fact that walking a tree of n nodes is O(n), and so
comparing two trees (non-naively) is also O(n), there may be a lot of data
in those nodes to compare.  If all we care about is detecting changes, a
significant speedup is possible if the interface for a tree object (or for
listeners acting on trees?) includes a flag that gets flipped by any
operation which has the possibility of changing the tree.  (N.B.
Possibility, not guarantee.  A guarantee would require the eval we're trying
to avoid.)  This tiny bit of added API built into a Listener baseclass for
processing tree changes (which are all covered by a small, finite set of
generic operations) would turn the subsequent O(n) comparison into O(1).
I'm not sure how this fits with your design.  Of course, all bets are off if
someone messes with the tree directly.  (N.B. Alternatively, a hash on trees
would be almost as cheap and require no extra API, but that only works if
we're willing to treat topologically equivalent trees with different memory
layouts (e.g. flipping a tree left-to right) as different.)

I don't think we need to do anything fancy. All we are doing is altering
pointers in the tree. If you want to delete a subtree, return null. If you
want to leaves the tree as it is, return the original pointer. If you want
to alter a subtree, return a pointer to the new tree.

> Ok, now point #1.  Not sure how to approach this, but...
> 
> Thought: If the root of a rule r which returns a value does not declare
the type of value expected, two questions arise. One, where/when does the
antlr tool perform type-checking for the rules which consume rule r? 

ANTLR would not be involved in this case because all ANTLR does is create
the parse trees. Our listener would be annotating those trees.

> Two, where/when does the antlr tool perform type-checking of the retvals
from listeners for different alts of r? 

 Again, I don't think ANTLR is involved here. the only way it would get
involved as if we defined return values or parameters that get shoved into
the context objects. And the context objects are the parse tree nodes.

> Perhaps an explicit decoration/declaration of a type T at the root of r is
the easiest, clearest and safest way to communicate to all of the listeners
of all of the alts of r that they must return type T.  My thinking here is
probably clouded by my preternaturally strong preference for static typing.

I too am leaning toward some static typing, but I hesitate to put "returns
[T v]" into the grammar because T it is application-specific. and, more
importantly, it is pass specific. each pass might need a different type so
we would need to do something like "returns [T v, U w]" but then the grammar
is specifically tied to a particular application and literally the parser
would not compile without having those types available. there would be no
way to provide such a generic grammar to users on the ANTLR website.

We need a way to decorate or annotate a node in a type safe way.  To do that
in Java, we have to use of classes or interfaces or something. Certainly, we
can provide a factory that creates the various contexts to use our
subclasses of ParserRuleContext like T and U. The only problem is that the
tree node types can only be sent once: on construction.

What if we had some kind of adapter that fed the listeners proxies? So the
original parse tree has generic nodes like ruleContext and we need T and U
for 2 different phases. As the parse tree walker moved along, it could wrap
the generic ruleContext objects with a proxy, of type T, that had the extra
information we need. T with points at ruleContext instead of using
inheritance.  If we didn't reuse the proxies, however, we would have to
create a new object for every listener event. That's not the end of the
world as I do that during parsing. But, we could think about reusing proxy
objects, though I doubt we would know when the application was going to
refer to multiple contexts. As long as the proxy object popped into
existence for the listener event and popped out of existence afterwards,
should not burden the garbage collector (which does not walk dead objects).

let's see what this would look like. Here is a chunk of the parse tree
walker:

ParseTree.RuleNode r = (ParseTree.RuleNode)t;
enterRule(listener, r);
int n = r.getChildCount();
for (int i = 0; i<n; i++) {
    walk(listener, r.getChild(i));
}       
exitRule(listener, r);

And let's say we have an adapter that knows how to create wrappers for parse
tree nodes:

adaptor.wrap(t); // returns type T or U or whatever you want

may be like this:

class T {
    ParserRuleContext delegate; // hide in a superclass somewhere
    int v; // my field
    public T(ParserRuleContext ctx) { delegate = ctx; }
}

class myAdaptor<T> implements ParseTreeWrapper {
    public T wrap(ParserRuleContext t) { return new T(t); }
}

Then, all we have to do is change the enter event to look like this:

enterRule(listener, adaptor.wrap(r));

The listener event would have to cast the ParserRuleContext to T or U etc.

public void exitRule(AParser.multContext ctx) {
    T t = (T)ctx;
    t.v = t.a.v * t.b.v;
}

MUCH cleaner and type safe.

OTOH, we could reverse it by adding a pointer to every parse tree node that
points to an object with extra fields. Only those parse trees that need
extra fields would have a pointer to an auxiliary object or object extension
if you will. 

Or, As Sam said, we could provide a factory for creating the various parse
tree nodes. If our application needed object extensions, we could pass in a
factory that created special context objects that had our fields. Those
objects would have to have a union of all the values needed by all phases,
but that's not necessarily a bad thing. It's often the case that I want my
type computation information to persist through multiple phases all the way
through to code generation. On the other hand, if we have lots of temporary
values we need just for computation, the tree node would have lots of extra
fields, mucking up our class. I suppose nothing beats the simplicity of a
factory creating the right objects. Then we can create a class like this

class T extends ParserRuleContext {
    int v; // temp value
    TypeInfo type; // persists across tree passes
}

The problem is that we would need a new class definition for every context,
meaning every rule, in the grammar. yuck.  actually, maybe it is not too bad
because we would only need to annotate some of the nodes such as eContext.
We would still need a typecast in the listeners like:  T t = (T)ctx;

It'd be simpler to add a field to ParserRuleContext and had a generic type
because it means creating only a single object:

class ParserRuleContext<T> {
    T extension;
    . existing stuff ...
}

All of the various context object types could point at the single type of
extension object.  I guess it would mean adding code in the listeners like
this:

public void exitRule(AParser.multContext<T> ctx) {
    if ( ctx.extension==null ) ctx.extension = new T();
    T t = ctx.extension;
    T a = ctx.a.extension; // ugh; kind of a mouthful
    T b = ctx.b.extension;
    t.v = a.v * b.v;
}

Hmm.

one thing I should note about the stack solution. I wonder if it might get
confusing knowing what is on the stack and what is not, particularly if not
every event has a listener implementation (that pushed a value). If the
values are stored in the parse tree contexts, you know that that no does not
have a value. With a stack, you would see a value computed way below that
you don't necessarily want. You would not know that it is not the value you
want because it is a disembodied value on the stack. That said, it's
simplicity is appealing. If we need to annotate the tree anyway, however,
maybe we do need a solution like I describe above.

> Well, I'm not sure if any of this helps.  But I'm reasonably confident
that it does not hurt.

"no animals were hurt during the production of this e-mail" :)

Ter

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-address