[antlr-interest] Translators Should Use Tree Grammars

Tue Nov 16 09:51:52 PST 2004

On Nov 16, 2004, at 9:19 AM, Alexey Demakov wrote:

>
> From: "Terence Parr" <parrt at cs.usfca.edu>
>> Ok, whew!  8 furiously written pages on visitors, tree node classes,
>> tree grammars, and action execution:
>>
>> http://www.antlr.org/article/1100569809276/use.tree.grammars.tml
>>
>> Hopefully, this is clearly written.  Regardless, it summarizes my
>> position rather thoroughly. ;)  Please send me feedback if you think
>> the article can be improved etc...
>
> Ok, I'll try to be advocate of hetero-trees. :)

Using tree grammars has nothing to do with hetero vs homo trees.  i can 
parse both.  Hetero trees are only useful for when you want to 
manipulate them manually.

> The main point: if we have good description of heterohenous tree
> we can generate all we need. In the article hetero-trees are considered
> without tool support. Of course, manual maintenance of node classes is
> difficult. But with proper tool you don't need to write get/set
> methods for chilren and attributes, don't need to write additional 
> checks.

ANTLR could generate classes just as you do, though I don't like the 
style of so many generated tree node types.

>> 1. tree structure is not validated
>
> Visitor should not validate tree structure, because it is described 
> separately
> and checked during tree construction.

Are you sure?  Do you check for missing children with code?  No type 
system can check this for you as null is valid for any type. ;)

>  When tree structure is well-designed,
> you can not create tree that doesn't correspond to syntactically 
> correct input program.

Yes, that is nice (assuming the type system is up to it...only OO 
languages can get close).

> Btw, parser is not right place to check tree correctness because tree 
> can be
> constructed even without parser :) So, tree should be self-validating.

A nice feature, but you must walk the tree anyway, can we agree?  Might 
as well validate as you go; it's free once you have your TreeDL or my 
grammar.

>> Node actions may compute data needed by future actions.
>> Because each action is isolated (usually as a overridden method in 
>> another class),
>> you cannot use the usual programming constructs like local variables, 
>> parameters,
>> and return values to pass information. You must use instance 
>> variables which are
>> clumsy and violate data hiding principles.
>
> If data is related to node (for example, cross-references between 
> references and
> definition), it can be stored in [dynamic] attribute of that node.
> I don't know how to use local variables to pass info to other method :)
> but parameters and return values can be used for visitors. At least in 
> interface methods
> - of course, between visit-methods parameters should be passed using 
> visitor fields.
>
> class ActionVisitor
> {
>     // used by visit-methods
>     Parameters parameters;
>     // set by visit methods
>     // can be used to simulate different return types
>     ReturnValue returnValue;
>
>     public ReturnValue process( Node node, Parameters parameters )
>     {
>         this.parameters = parameters;
>         node.accept( this );
>         return this.returnValue;
>     }
>
>     public void visit( SomeNode node )
>     {
>         // instead of return
>         // unfortunately compiler can't check that all control paths
>         // return value
>         this.returnValue =
>             // node and parameters can be used
>             F( node, parameters );
>     }
> }
>
> Not very good-looking, but it works. Better if it is generated
> from some clear description like usual methods.

Yes, a mechanism could be built but a bit ugly as you say.  Remote 
attribution gives you lots of capabilities too.

> On the other hand hetero-trees allow easy access to child subtree.
> As far as I know in tree grammar we have access only to entities
> mentioned in current rule?

You must write code in a visitor to go to the children.  We use 
patterns (i.e., rule references) in antlr to do that.  Naturally we 
could use actions to go to the children manually ala visitors ;)

>> Manual Tree Walkers
>> 2. manually built tree walkers have no guarantee that their parsing
>> logic is complete and consistent
>
> The same is true for tree parsers - structure of tree built by parser
> can differ from tree used in tree grammar.

Yeah, but the parser will say "yes" or "no", right?  Build what you 
want, then I parse it to check as I have to visit anyway each node.

>> 3. manually building tree walkers is much more laborious than the 
>> visitor pattern
>
> Visiting and tree walking are independent things.
> The main purpose of visitor pattern is implementation of "external 
> virtual methods"
> for tree nodes. Visitors can use different tree walking strateges (or 
> don't use walking at all):

Yep.

> 1. Generated tree walker calls actions for each node. Actions called 
> only from tree walker,
> so only tree walker uses accept() methods of nodes to call correct 
> walk() method of child node.
> Class with action methods need not to implement visitor pattern.
>
> In this case tree walking doesn't require additional work - walker is 
> generated.
> But it is useful only in simple cases when actions can be called 
> before/after subtree walking.
> If you need action/walking mixture, you need manual tree walker (at 
> least some methods overriden).
>
> 2. Manual tree walking using child.accept( this ) in visit methods.
>
> Yes, requires additional work and
>> 1. action and tree structure walking code is entangled; you risk 
>> breaking action code
>> when altering the walking code
> is true
>
> 3. There are special cases when manual tree walking doesn't require 
> additional work.
> For example, for codegeneration I use library that support such 
> templates
>
> class CodeGeneratorVisitor
> {
>     public void visitBinaryExpr( BinaryExpr node )
>     {
>         // template for output
>         txt( "${left} ${sign} ${right}" );
>     }
> }
>
> visit methods for parameters ${...} are called automatically in 
> library method txt().
> (I know, there is drawback - patterns can not be checked at 
> compile-time, will try
> to improve code generation library in the future)
>
> I think the last case is the hint of what should be a source for 
> visitor generation
> - I repeat, description should be clear from implementation details,
> only what we want to express.
>
>> For example, how do you express that a child can be optional?
>
> node PlusNode : ExprNode
> {
>     child ExprNode left;
>     child ExprNode? right;
> }
>
> In TreeDL types don't allow null value implicitly. You need to write 
> '?' that means 'optional'
> as in BNF notation. As far as I remember in some specification 
> languages (VDM)
> type [T] means 'value of type T or null'. May be if it would be used 
> in programming languages,
> NullPointerException would be rare.
> Damn, I need to improve my English :) Hope, you can understand me, but 
> I'm
> sure there is grammatical error.

Your English is great!

So, you generate code in the constructor to validate that the correct 
pointers are null / not-null?  That is good and is required.

>> he says that his manually developed Java tree has about 150 nodes.
>> Regardless, you are still looking at lots of files.
>
> No, I'm looking at single file with tree description. :)

That is true.  Do you ever look at the generated classes though?  You 
talk about manually setting/getting/building etc...  If you never look 
at the output, all is well and i'll change the article ;)

> It has about 1200 lines with some comments.
> Parser is about three times longer.
>
> Errata:
>> enapsulating
> -> encapsulating

oops.

>> For example, why look in n files for n different statement node types?
>> All StatementList can show is, at most, a typed list: 
>> List<StatementNode>.
>> It says absolutely nothing about what the statements are--the 
>> subclasses
>> of StatementNode seem a rather weak and indirect method for specifying
>> such a close relationship.
>
>> You cannot list the expression tree types; one must find all 
>> subclasses of ExprNode.
>
> HTML documentation can be generated from TreeDL tree description
> with cross-references between node type and all inherited node types.

Yes, but what you have is not a grammar unless it explicitly shows the 
possible subtrees.  Why use implicit rather than explicit 
specification.  You are so close to a grammar!  I think the summary is 
that you generate types and I generate methods and I allow actions in 
the grammar rather than visitors. ;)

>> Not only is the tree grammar more explicit (i.e., you are saying 
>> exactly
>> what you mean rather than co-opting a type system),
>
> As I mentioned in one of previous messages, it seems that tree grammar 
> and tree description
> are almost equivalent.

Almost ;)

>> it is smaller,
>
> Than what? Than generated from tree description Java files? Of course.
> Tree description itself also is longer but mainly because it is 
> written line-per-child
> and uses longer keywords :) But I believe TreeDL is more readable 
> (especially
> for newbies) than ANTLR tree grammar (without actions, but with labels 
> for subrules).

Perhaps as they are more familiar with fields than grammars.

>> it is completely self-contained
>
> The same is true for TreeDL tree description.

Minus the fact that stmt node does not list the statements, I might 
agree. ;)

>> and formally guarantees tree structure
>
> Again, it's true for TreeDL. But TreeDL checks trees during 
> construction,
> not during processing.

Correct.  A nice feature.

> Btw, repeating tree grammar when writing each action in some sence is 
> like
> repeating a class declaration in each method that process objects of 
> this class :)
>
>> (as opposed to hand-written walk() methods that might forget to walk 
>> down
>> one of the children).
>
> But not for generated walkers. And when actions are written manually,
> missing of some child walking usually cause missing action.
> In tree grammar also is possible to miss action.

Not a parsing action, just a user action.

> So, chances are similar. :)

more likely that a programmer will screw up the parser than their 
actions within a grammar otherwise we'd not use parser generators ;)

>> I should also point out that Loring Craymer has a
>> means of automatically generating tree grammars from the text parser 
>> grammar
>> that builds trees (to be released in the 2.8 experimental release).
>
> Sounds interesting.
>
>> Context Information
>
> I'm also not very happy with visitors. But the way of passing context 
> information
> can be used not only with ANTLR tree grammars. Some notation can be 
> used as source
> for specification of actions over tree. It can be similar to tree 
> grammar in part of
> context information handling, but don't need to describe full tree 
> structure
> because it is described separately. And from it we can generate 
> visitors or what else we need.
>
> Some time ago, I played with preprocessor that generates visitors from 
> something like this:
>
> node SomeNode
> {
>     visit Visitor1
>     {
>         for( int i = 0; i < $#childList; i++ )
>         {
>             F( $childList[i] )
>         }
>     }
>
>     visit Visitor2
>     {
>         anotherNode.$childOfAnotherNode
>     }
> }
>
> visit() methods are grouped by node, like methods. And children can be 
> referenced using $childName
> shorthand notation because I hate Java long getChildName() and like C# 
> properties.

That is good and is equiv to grammars + actions, right? :)

> May be now it is time to remember this tool and rethink what should be 
> a source
> and what should be generated. I'm inspired by tree grammars :)

Hooray!  Soon we will agree completely minus what the generated code 
looks like :)

>> Multiple Tree Grammars
>
> According to my experience, cut/paste without tool support causes 
> many-many errors.
> "Text editor in skilful hands - an awful weapon" :)

;)

> In any case, tree grammars and descriptions - are two approaches to 
> the same problem.
> I believe that both are powerful enough, both have drawbacks and both 
> can be significantly improved.
> May be it will make compiler construction even easier than today :)

Let's hope!

Thanks for the great discussion.
Ter
--
CS Professor & Grad Director, University of San Francisco
Creator, ANTLR Parser Generator, http://www.antlr.org
Cofounder, http://www.jguru.com
Cofounder, http://www.knowspam.net enjoy email again!

Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/