[antlr-interest] Re: AST factory / heterogeneous tree enhancement
micheal_jor
open.zone at virgin.net
Mon Oct 21 17:19:56 PDT 2002
Hi again,
Couldn't stay away so here's a more detailed reply.
--- In antlr-interest at y..., Terence Parr <parrt at j...> wrote:
> Ok, Loring and I have discussed the tree factory problems.
> "micheal_jor" <open.zone at v...> brought them up regarding C# and
> Ric seems to have fixed this for C++. So, now the Java solution.
>
> Here is the problem as I understand it.
>
> 1. #[FOO] always builds an AST node of the default type because
> the ASTFactory only knows about the default.
This is an accurate statement of the problem with AST construction
with the Java codegen and [formerly] the C# codegen.
There is also a related issue when the nodetype is specified by
annotating a token reference in a grammar:
aRule
: TOK1<AST=CustomNode.Tok1Node> TOK1<AST=CustomNode.Tok2Node>
;
One additional issue that I would like to introduce relates to token
redefinition. How does one specify a custom ASTNodeType globally for
terminals such as ID and PLUS that aren't originally defined in a
tokens {..} section?
We have tokens defined in the lexer (and therefore without
ASTNodeTypes) that were to be "importVocab'd" into the parser
(parsers actually). We planned to add the ASTNodeTypes (which could
be different for different parsers) in the token's section in the
parser. Can we use the tokens {...} construct to do this with
terminals like ID and PLUS?. Being forced to use per-TokenRef options
is very wasteful/verbose since it will be the same for all IDs.
> In future if you say
>
> tokens {
> PLUS<AST=PLUSNode>;
> ...
> }
>
> then I'll make action #[PLUS] create the right node. You can
> also say
> #[ID,"foo","VarNode"] (3rd arg is the type of node to create).
I presume you meant that _both_ #[PLUS] and #[PLUS, "sometext"] will
be fixed.
I kinda like the extended syntax - I view it (and the per-tokenRef
option) as a sort of local override of the global
TokenType==>ASTNodeType mapping established with setASTNodeClass and
tokens {...}.
In our [informal] ANTLR coding standards, using "local override"
ASTNodeType constructs is the exception rather than the rule.
> 2. dup methods of ASTFactory don't respect the type of the nodes; it
> uses default node type. In future, i'll use
> t.getClass().newInstance()
> to do the dup.
>
The dup() methods ultimately call the factory's create() method. Once
the factory is able to create the right nodes based on it's type, the
dup() methods should just work. At least that was the experience with
C#.
> 3. hetero tree construction does not call the factory. E.g.,
>
> anIntRule : INT<AST=INTNode> ;
>
> generates
>
> INTNode v = new INTNode(LT(1));
>
> but we need to instead generate:
>
> AST v = (AST)astFactory.create(LT(1),"INTNode");
>
> where the create(...) method is new and specifies the type to
> create. This will use newInstance() instead of "new" by
> default.
>
This contradicts the "Heterogeneous AST" section of the reference
manual which states that "ANTLR uses the factory to create nodes for
which it does not know the specific type".
My opinion is that ANTLR should always use the ASTFactory except for
(1) the new extended AST construction syntax and (2) the per-tokenRef
ASTNodeType option since they effectively "override" the factory's
global view of Token==>ASTNodeType mappings specified with the
setASTNodeClass and the "tokens {...}" options.
I can't actually remember what policy has been (or is to be)
implemented in the C# codegen but, I remember that the pre-existing
mechanism for reading the grammar file and loading the various
options removed the distinction between per-Token and per-TokenRef
ASTNodeType settings for grammar atoms. The GrammarAtom simply has an
ASTNodeType attribute.
So I guess for "all non-manual tree construction requests that
involve per-token or per-tokenref ASTNodeType options" the C# codegen
will (must?) always generate
INTNode v = new INTNode(LT(1));
The specified ASTNodeType may not be the type associated with the
TokenType in the ASTFactory's mapping table so it's safer to just
bypass the factory entirely.
Incidentally, the same will be true of the extended manual tree
construction syntax.
> 4. If you define ID<AST=T> in tokens section then all code in
grammar
> "id:ID" should
> define labels as "T id" not "AST id" nor labelASTType id.
Hmmm. Interesting. I don't think either of the C++ and the C# codegen
do this. What would be the benefit?
Cheers,
Micheal
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list