[antlr-interest] stuff I don't like about ANTLR 2.x

Sun Mar 7 11:56:15 PST 2004

The main thing that gets me about Antlr is the tree construction
manipulation.  It is too easy to get loops in your trees.
Here is my suggestions  for a revised syntax for tree construction. As you
can see they as been around for a while now.
all the best
matthew

Draft specification for Antlr Tree Generation
By Matthew Ford
Revision 1  25th Sept 2001

Section 1:Control of automatic tree generation in the Parser
By default the Parser will automatically generate AST trees.  This
generation can be
disabled globally by setting buildAST=false.

When buildAST=false ALL code related to AST tree building is removed and the
only ways to build your own tree are :-
i) to update a global tree or
ii) use Antlr's return syntax to pass your own tree back.
(Note: in this mode you should not need to link or load any AST code unless
you
reference it yourself from action, etc)

With buildAST = true (that is the default) you can selectively disable tree
generation
by using the ! syntax.  This can be used on either a rule or token basis.
Example of a rule based use of ! to disable tree generation

addition!
    :   INT PLUS i:INT
;

In this case no tree generation code is generated for this rule.  If you
want to create a
tree by hand for this rule you need to return it as shown below
addition returns [AST return_tree]!
   :   INT PLUS i:INT  { .. code to generate return_tree }
;

So I suggest this be relaxed a little to say that
No tree generation code is output except that labels in the rule are
initialized with the
appropiate minimal tree.
For example
> drop_table_statement!
> : "drop" "table" t3:table_name t4:drop_behavior
;
results in #t3 containing tree resulting from the rule table_name
and
statement!
: INT PLUS i:INT
;
would set up a tree for label i consisting of a single root node containing
the INT
token
This allows the user to control what tree code is added to their code if the
tree
generation is turned off for a rule. If there are not labels then no code.

To suppress a single token use ! after the token.  It will not be added to
the tree, eg.
statement
:  lhsVar EQUALS rhs SEMI!  // SEMI is not added to the tree.
;

Note as far as the rule statement is concerned
statement
:   lhsVar EQUALS addition!  // suppress addition of tree returned from
addition
;
addition:
    :   INT PLUS i:INT
;

Is the same as
statement
:   lhsVar EQUALS addition
;
addition!   // suppress generation tree
    :   INT PLUS i:INT
;

But in the second case no rule in the parser can get a tree from the
addition rule.

and
statement
 :  lhsVar EQUALS addition!
;
addition!
    :   INT PLUS i:INT
;

is redundant but legal.

You would probably actually use something like
statement
{AST addTree;}
:   lhsVar EQUALS^ addTree=addition!
   { ## =   build tree here using ## and addTree }
;

addition returns [AST returnTree]
    :   INT PLUS^ i:INT
{ returnTree = ##}  // pick up the autogenerated tree
;

Note: It makes no sense in this system to allow ! to be applied to
alternative of rules
that is :-
statement
{AST addTree;}
:   lhsVar EQUALS^ addTree=addition!
    { ## =   build tree here using ## and addTree }
|!  printstatement
;

is now illegal

In all other cases (that is when buildAST is true and ! is not used) the
return tree is
always generated and assigned to the global AST_return to be picked up by
the parent
rule.  This AST_return can be modified/overwritten using the syntax
discussed below.

Section 2: Syntax for manual modification of trees in the Parser
Note this is for modification of trees that have been automatically created.
If you set
buildAST=false or use ! on a rule, you are on your own as no tree code is
generated
for you.

Tree nodes are created using
#[TOKEN_TYPE] or #[TOKEN_TYPE,"text"]

Trees are created using
#(root, c1, ..., cn)
where
 root must be a node
 c1,to cn are the 1st to nth children which may be either nodes or
other trees.

Elements of the current rule can be addressed using the following
## is a short cut for AST_return, the current result tree.
#id is a short cut for the current tree rooted at the location originally
occupied by the
node labelled by id
@id is a short cut for the root node of the tree rooted at the location
originally
occupied by the node labelled by id
When these occur on the rhs of = they are replaced by clones of their
respective nodes
or trees. This prevents deadly loops.  As an optimisation ## =
#(#[token],##) could be
done without cloning ##.
When these occur on the lhs of = they refer to that location in the tree.
This allows
subtree replacements.

eg
statement
:   lhsVar e:EQUALS^ a:addition
  {
      #a = #(@a,#[INT,"5"],#[INT,"6"]);
     // the children of the addition subtree in the result (##) have been
replace with 5,6
    // a: now refers to the new subtree, the original subtree is has been
replaced by it.
    @a = #[MINUS]
    // the root of the new subtree a is now MINUS
   ## = #(#[STATEMENT],##);
   // add a node to the top of the result tree.  a: and e: still point to
the same subtrees.
  ## = #[DIV];
  // where do a: and e: point now?  They still point to there subtrees which
are not
released until the rule returns.
 // so the following is valid
  ## = #(@#,#a,#[INT,"3"],#a);
   // @# is the root node of ##  which is now just #[DIV]
   // note it is valid to use #a twice as it is cloned.
}
;

----- Original Message ----- 
From: "Terence Parr" <parrt at cs.usfca.edu>
To: <antlr-interest at yahoogroups.com>
Sent: Sunday, March 07, 2004 7:05 AM
Subject: [antlr-interest] stuff I don't like about ANTLR 2.x

> Folks,
>
> In preparation for the ANTLR 3.0 whitepaper, I need to start writing
> down everything that annoys me about ANTLR 2.0.  I have started another
> "blog" document:
>
> http://www.antlr.org/blog/antlr3/antlr2.bashing.tml
>
> Feel free to send in your pet peeves to me or to this list.  I will try
> to add to this file.
>
> I can hear John Mitchell now: "predicate hoisting!" ;)
>
> Terence
> --
> Professor Comp. Sci., University of San Francisco
> Creator, ANTLR Parser Generator, http://www.antlr.org
> Cofounder, http://www.jguru.com
> Cofounder, http://www.knowspam.net enjoy email again!
> Cofounder, http://www.peerscope.com pure link sharing
>
>
>
>
>
>
> Yahoo! Groups Links
>
>
>
>
>

Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/

-------------- next part --------------
Draft specification for Antlr Tree Generation
By Matthew Ford 
Revision 1  25th Sept 2001

Section 1:Control of automatic tree generation in the Parser
By default the Parser will automatically generate AST trees.  This generation can be 
disabled globally by setting buildAST=false.  

When buildAST=false ALL code related to AST tree building is removed and the 
only ways to build your own tree are :-
i) to update a global tree or 
ii) use Antlr's return syntax to pass your own tree back.
(Note: in this mode you should not need to link or load any AST code unless you 
reference it yourself from action, etc)

With buildAST = true (that is the default) you can selectively disable tree generation 
by using the ! syntax.  This can be used on either a rule or token basis.
Example of a rule based use of ! to disable tree generation

addition!
    :   INT PLUS i:INT
;

In this case no tree generation code is generated for this rule.  If you want to create a 
tree by hand for this rule you need to return it as shown below
addition returns [AST return_tree]!
   :   INT PLUS i:INT  { .. code to generate return_tree }
;

So I suggest this be relaxed a little to say that 
No tree generation code is output except that labels in the rule are initialized with the 
appropiate minimal tree. 
For example 
> drop_table_statement!
> : "drop" "table" t3:table_name t4:drop_behavior 
;
results in #t3 containing tree resulting from the rule table_name 
and 
statement! 
: INT PLUS i:INT 
; 
would set up a tree for label i consisting of a single root node containing the INT 
token 
This allows the user to control what tree code is added to their code if the tree 
generation is turned off for a rule. If there are not labels then no code. 

To suppress a single token use ! after the token.  It will not be added to the tree, eg.
statement
:  lhsVar EQUALS rhs SEMI!  // SEMI is not added to the tree.
;

Note as far as the rule statement is concerned 
statement
:   lhsVar EQUALS addition!  // suppress addition of tree returned from addition
;
addition:
    :   INT PLUS i:INT
;

Is the same as 
statement
:   lhsVar EQUALS addition
;
addition!   // suppress generation tree
    :   INT PLUS i:INT
;

But in the second case no rule in the parser can get a tree from the addition rule.

and  
statement
 :  lhsVar EQUALS addition!
;
addition!
    :   INT PLUS i:INT
;

is redundant but legal.

You would probably actually use something like
statement
{AST addTree;}
:   lhsVar EQUALS^ addTree=addition!
   { ## =   build tree here using ## and addTree }
;

addition returns [AST returnTree]
    :   INT PLUS^ i:INT
{ returnTree = ##}  // pick up the autogenerated tree
;

Note: It makes no sense in this system to allow ! to be applied to alternative of rules 
that is :-
statement
{AST addTree;}
:   lhsVar EQUALS^ addTree=addition!
	   { ## =   build tree here using ## and addTree }
|!  printstatement
;

is now illegal

In all other cases (that is when buildAST is true and ! is not used) the return tree is 
always generated and assigned to the global AST_return to be picked up by the parent 
rule.  This AST_return can be modified/overwritten using the syntax discussed below.

Section 2: Syntax for manual modification of trees in the Parser
Note this is for modification of trees that have been automatically created.  If you set 
buildAST=false or use ! on a rule, you are on your own as no tree code is generated 
for you.

Tree nodes are created using
#[TOKEN_TYPE] or #[TOKEN_TYPE,"text"] 

Trees are created using 
#(root, c1, ..., cn)
where
 root must be a node
 c1,to cn are the 1st to nth children which may be either nodes or 
other trees.

Elements of the current rule can be addressed using the following
## is a short cut for AST_return, the current result tree.
#id is a short cut for the current tree rooted at the location originally occupied by the 
node labelled by id
@id is a short cut for the root node of the tree rooted at the location originally 
occupied by the node labelled by id
When these occur on the rhs of = they are replaced by clones of their respective nodes 
or trees. This prevents deadly loops.  As an optimisation ## = #(#[token],##) could be 
done without cloning ##.
When these occur on the lhs of = they refer to that location in the tree. This allows 
subtree replacements.

eg
statement
:   lhsVar e:EQUALS^ a:addition
  { 
      #a = #(@a,#[INT,"5"],#[INT,"6"]);   
     // the children of the addition subtree in the result (##) have been replace with 5,6
    // a: now refers to the new subtree, the original subtree is has been replaced by it.
    @a = #[MINUS]
    // the root of the new subtree a is now MINUS
   ## = #(#[STATEMENT],##);
   // add a node to the top of the result tree.  a: and e: still point to the same subtrees.
  ## = #[DIV];
  // where do a: and e: point now?  They still point to there subtrees which are not 
released until the rule returns.
 // so the following is valid
  ## = #(@#,#a,#[INT,"3"],#a);
   // @# is the root node of ##  which is now just #[DIV]
   // note it is valid to use #a twice as it is cloned.
}
;