[antlr-interest] Re: AST enhancements

Wed Aug 11 17:12:00 PDT 2004

--- In antlr-interest at yahoogroups.com, "lgcraymer" <lgc at m...> wrote:
> --- In antlr-interest at yahoogroups.com, "Eric Mahurin"
> <eric_mahurin at y...> wrote:
> > 3. Have a standard interface for separating/storing/retrieving
> > children to the left vs. right of the root so that token order can be
> > easily preserved.  Right now with the way ASTFactory does things, the
> > current interface can be used to get the token order correct, but the
> > "addChild" definition must be changed (insert child at beginning
> > instead of end).  Instead, I think the interface and ASTFactory should
> > assume that the AST's keep order, but the implementation wouldn't have
> > to.  Maybe all that is needed is a redefinition of addChild (insert to
> > the left of the left children), and a few methods which return
> > separate left and right children lists (non-ordered ASTs would return
> > the whole list or nothing for these).
> 
> That's a major interface change which ain't going to happen.  A better
> alternative is provided by Ter's TokenStreamRewriteEngine

I'm not sure how TokenStreamRewriteEngine helps preserve token order
withn an AST structure.

The minimal interface change to implement this is to change the
documentation for addChild to say (leftmost) instead of (rightmost). 
In the ASTFactory, addChild is used to insert children that preceded a
root (is left of) and setFirstChild/setNextSibling is used for
children that come after (is right of).  Right now, addChild is only
used with the root being a simple node (no children/siblings), so
inserting at the leftmost or rightmost position won't matter.  This
change is also needed to root rules rather than tokens because the
child order would get screwed up.

I guess other methods to differentiate left and right children could
just be added to classes that do this.  I guess the minimal thing woud
be just a method to return the number of children to the left of the root.

What I'm concerned about for this is that the way I'm preserving token
order I'm depending on the way ASTFactory call the AST methods.  I'm
depending on addChild being used to add children that precede the root
and setFirstChild/setNextSibling being used for children that follow
the root.  If this will continue to be the case for future antlr's I'm OK.

> > 4. Consider using a more generic "List" like interface storing sibling
> > lists.  This way you can have implementations other than linked lists
> > and you can get easy random access.
> > 
> 
> Why would you need random access as a generic capability?  Besides,
> this can be done on a custom basis already.  Consider having AST nodes
> with List and index fields (getNextSibling and getFirstChild would set
> the index values).

I don't get what you mean by this last sentence.  If you had
get/setNextChild I would see how, but they would have side-effects
(changing the current index).

This was a low priority thing on my list.  The problem is if I wanted
to implement the AST sibling list with a ArrayList or such for
performance reasons.  I may want to have random access to a set of
siblings instead of having to walk the list using getNextSibling. 
With this interface you would have no get/setNextSibling and
get/setFirstChild.  Instead you would have:

public List getChildren();
public setChildren(List children); // list of AST's

You could use all of the List methods to manipulate the children list.
   If you used a LinkedList, it would basically boil down to what you
have now, but with the additional List methods at your disposal.

This is a major interface change and don't expect you to do it, but I
thought I'd mention it.

> > 5. Add some symbol table stuff.  It would be nice to be able to easily
> 
> Under consideration, as is an attribute syntax.  I, personally, am not
> in favor of direct symbol table support--it adds to the work of
> supporting a new target language--but think that an attribute syntax
> makes sense.

I agree that it does complicate things and it may be difficult to come
up with something generic enough, but consider that anytime you parse
a language that has variables, you'll probably need to make some kind
of symbol table.  And there will probably be some kind of scoping to
the variables, so putting this into the AST structure which should
alreay be structured according to the scoping might be a good idea.

> 
> --Loring
> 
> > store symbol table data in the AST.  I haven't thought through the
> > details yet, but I was thinking you'd have some suffix to say you
> > wanted something to be a key in a symbol table and the "children" ASTs
> > would be the value for this entry.  This could work similar to the "^"
> >  AST operator, but instead of making a root with children, it would
> > make a map entry.
> > 
> > Eric

Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/