[antlr-interest] stuff I don't like about ANTLR 2.x
matthew ford
Matthew.Ford at forward.com.au
Sun Mar 7 11:56:15 PST 2004
The main thing that gets me about Antlr is the tree construction
manipulation. It is too easy to get loops in your trees.
Here is my suggestions for a revised syntax for tree construction. As you
can see they as been around for a while now.
all the best
matthew
Draft specification for Antlr Tree Generation
By Matthew Ford
Revision 1 25th Sept 2001
Section 1:Control of automatic tree generation in the Parser
By default the Parser will automatically generate AST trees. This
generation can be
disabled globally by setting buildAST=false.
When buildAST=false ALL code related to AST tree building is removed and the
only ways to build your own tree are :-
i) to update a global tree or
ii) use Antlr's return syntax to pass your own tree back.
(Note: in this mode you should not need to link or load any AST code unless
you
reference it yourself from action, etc)
With buildAST = true (that is the default) you can selectively disable tree
generation
by using the ! syntax. This can be used on either a rule or token basis.
Example of a rule based use of ! to disable tree generation
addition!
: INT PLUS i:INT
;
In this case no tree generation code is generated for this rule. If you
want to create a
tree by hand for this rule you need to return it as shown below
addition returns [AST return_tree]!
: INT PLUS i:INT { .. code to generate return_tree }
;
So I suggest this be relaxed a little to say that
No tree generation code is output except that labels in the rule are
initialized with the
appropiate minimal tree.
For example
> drop_table_statement!
> : "drop" "table" t3:table_name t4:drop_behavior
;
results in #t3 containing tree resulting from the rule table_name
and
statement!
: INT PLUS i:INT
;
would set up a tree for label i consisting of a single root node containing
the INT
token
This allows the user to control what tree code is added to their code if the
tree
generation is turned off for a rule. If there are not labels then no code.
To suppress a single token use ! after the token. It will not be added to
the tree, eg.
statement
: lhsVar EQUALS rhs SEMI! // SEMI is not added to the tree.
;
Note as far as the rule statement is concerned
statement
: lhsVar EQUALS addition! // suppress addition of tree returned from
addition
;
addition:
: INT PLUS i:INT
;
Is the same as
statement
: lhsVar EQUALS addition
;
addition! // suppress generation tree
: INT PLUS i:INT
;
But in the second case no rule in the parser can get a tree from the
addition rule.
and
statement
: lhsVar EQUALS addition!
;
addition!
: INT PLUS i:INT
;
is redundant but legal.
You would probably actually use something like
statement
{AST addTree;}
: lhsVar EQUALS^ addTree=addition!
{ ## = build tree here using ## and addTree }
;
addition returns [AST returnTree]
: INT PLUS^ i:INT
{ returnTree = ##} // pick up the autogenerated tree
;
Note: It makes no sense in this system to allow ! to be applied to
alternative of rules
that is :-
statement
{AST addTree;}
: lhsVar EQUALS^ addTree=addition!
{ ## = build tree here using ## and addTree }
|! printstatement
;
is now illegal
In all other cases (that is when buildAST is true and ! is not used) the
return tree is
always generated and assigned to the global AST_return to be picked up by
the parent
rule. This AST_return can be modified/overwritten using the syntax
discussed below.
Section 2: Syntax for manual modification of trees in the Parser
Note this is for modification of trees that have been automatically created.
If you set
buildAST=false or use ! on a rule, you are on your own as no tree code is
generated
for you.
Tree nodes are created using
#[TOKEN_TYPE] or #[TOKEN_TYPE,"text"]
Trees are created using
#(root, c1, ..., cn)
where
root must be a node
c1,to cn are the 1st to nth children which may be either nodes or
other trees.
Elements of the current rule can be addressed using the following
## is a short cut for AST_return, the current result tree.
#id is a short cut for the current tree rooted at the location originally
occupied by the
node labelled by id
@id is a short cut for the root node of the tree rooted at the location
originally
occupied by the node labelled by id
When these occur on the rhs of = they are replaced by clones of their
respective nodes
or trees. This prevents deadly loops. As an optimisation ## =
#(#[token],##) could be
done without cloning ##.
When these occur on the lhs of = they refer to that location in the tree.
This allows
subtree replacements.
eg
statement
: lhsVar e:EQUALS^ a:addition
{
#a = #(@a,#[INT,"5"],#[INT,"6"]);
// the children of the addition subtree in the result (##) have been
replace with 5,6
// a: now refers to the new subtree, the original subtree is has been
replaced by it.
@a = #[MINUS]
// the root of the new subtree a is now MINUS
## = #(#[STATEMENT],##);
// add a node to the top of the result tree. a: and e: still point to
the same subtrees.
## = #[DIV];
// where do a: and e: point now? They still point to there subtrees which
are not
released until the rule returns.
// so the following is valid
## = #(@#,#a,#[INT,"3"],#a);
// @# is the root node of ## which is now just #[DIV]
// note it is valid to use #a twice as it is cloned.
}
;
----- Original Message -----
From: "Terence Parr" <parrt at cs.usfca.edu>
To: <antlr-interest at yahoogroups.com>
Sent: Sunday, March 07, 2004 7:05 AM
Subject: [antlr-interest] stuff I don't like about ANTLR 2.x
> Folks,
>
> In preparation for the ANTLR 3.0 whitepaper, I need to start writing
> down everything that annoys me about ANTLR 2.0. I have started another
> "blog" document:
>
> http://www.antlr.org/blog/antlr3/antlr2.bashing.tml
>
> Feel free to send in your pet peeves to me or to this list. I will try
> to add to this file.
>
> I can hear John Mitchell now: "predicate hoisting!" ;)
>
> Terence
> --
> Professor Comp. Sci., University of San Francisco
> Creator, ANTLR Parser Generator, http://www.antlr.org
> Cofounder, http://www.jguru.com
> Cofounder, http://www.knowspam.net enjoy email again!
> Cofounder, http://www.peerscope.com pure link sharing
>
>
>
>
>
>
> Yahoo! Groups Links
>
>
>
>
>
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/antlr-interest/
<*> To unsubscribe from this group, send an email to:
antlr-interest-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
-------------- next part --------------
Draft specification for Antlr Tree Generation
By Matthew Ford
Revision 1 25th Sept 2001
Section 1:Control of automatic tree generation in the Parser
By default the Parser will automatically generate AST trees. This generation can be
disabled globally by setting buildAST=false.
When buildAST=false ALL code related to AST tree building is removed and the
only ways to build your own tree are :-
i) to update a global tree or
ii) use Antlr's return syntax to pass your own tree back.
(Note: in this mode you should not need to link or load any AST code unless you
reference it yourself from action, etc)
With buildAST = true (that is the default) you can selectively disable tree generation
by using the ! syntax. This can be used on either a rule or token basis.
Example of a rule based use of ! to disable tree generation
addition!
: INT PLUS i:INT
;
In this case no tree generation code is generated for this rule. If you want to create a
tree by hand for this rule you need to return it as shown below
addition returns [AST return_tree]!
: INT PLUS i:INT { .. code to generate return_tree }
;
So I suggest this be relaxed a little to say that
No tree generation code is output except that labels in the rule are initialized with the
appropiate minimal tree.
For example
> drop_table_statement!
> : "drop" "table" t3:table_name t4:drop_behavior
;
results in #t3 containing tree resulting from the rule table_name
and
statement!
: INT PLUS i:INT
;
would set up a tree for label i consisting of a single root node containing the INT
token
This allows the user to control what tree code is added to their code if the tree
generation is turned off for a rule. If there are not labels then no code.
To suppress a single token use ! after the token. It will not be added to the tree, eg.
statement
: lhsVar EQUALS rhs SEMI! // SEMI is not added to the tree.
;
Note as far as the rule statement is concerned
statement
: lhsVar EQUALS addition! // suppress addition of tree returned from addition
;
addition:
: INT PLUS i:INT
;
Is the same as
statement
: lhsVar EQUALS addition
;
addition! // suppress generation tree
: INT PLUS i:INT
;
But in the second case no rule in the parser can get a tree from the addition rule.
and
statement
: lhsVar EQUALS addition!
;
addition!
: INT PLUS i:INT
;
is redundant but legal.
You would probably actually use something like
statement
{AST addTree;}
: lhsVar EQUALS^ addTree=addition!
{ ## = build tree here using ## and addTree }
;
addition returns [AST returnTree]
: INT PLUS^ i:INT
{ returnTree = ##} // pick up the autogenerated tree
;
Note: It makes no sense in this system to allow ! to be applied to alternative of rules
that is :-
statement
{AST addTree;}
: lhsVar EQUALS^ addTree=addition!
{ ## = build tree here using ## and addTree }
|! printstatement
;
is now illegal
In all other cases (that is when buildAST is true and ! is not used) the return tree is
always generated and assigned to the global AST_return to be picked up by the parent
rule. This AST_return can be modified/overwritten using the syntax discussed below.
Section 2: Syntax for manual modification of trees in the Parser
Note this is for modification of trees that have been automatically created. If you set
buildAST=false or use ! on a rule, you are on your own as no tree code is generated
for you.
Tree nodes are created using
#[TOKEN_TYPE] or #[TOKEN_TYPE,"text"]
Trees are created using
#(root, c1, ..., cn)
where
root must be a node
c1,to cn are the 1st to nth children which may be either nodes or
other trees.
Elements of the current rule can be addressed using the following
## is a short cut for AST_return, the current result tree.
#id is a short cut for the current tree rooted at the location originally occupied by the
node labelled by id
@id is a short cut for the root node of the tree rooted at the location originally
occupied by the node labelled by id
When these occur on the rhs of = they are replaced by clones of their respective nodes
or trees. This prevents deadly loops. As an optimisation ## = #(#[token],##) could be
done without cloning ##.
When these occur on the lhs of = they refer to that location in the tree. This allows
subtree replacements.
eg
statement
: lhsVar e:EQUALS^ a:addition
{
#a = #(@a,#[INT,"5"],#[INT,"6"]);
// the children of the addition subtree in the result (##) have been replace with 5,6
// a: now refers to the new subtree, the original subtree is has been replaced by it.
@a = #[MINUS]
// the root of the new subtree a is now MINUS
## = #(#[STATEMENT],##);
// add a node to the top of the result tree. a: and e: still point to the same subtrees.
## = #[DIV];
// where do a: and e: point now? They still point to there subtrees which are not
released until the rule returns.
// so the following is valid
## = #(@#,#a,#[INT,"3"],#a);
// @# is the root node of ## which is now just #[DIV]
// note it is valid to use #a twice as it is cloned.
}
;
More information about the antlr-interest
mailing list