[antlr-interest] Re: AST factory / heterogeneous tree enhancement

Mon Oct 21 23:34:08 PDT 2002

Ter, Micheal--

Some comments embedded below.

--Loring

--- In antlr-interest at y..., "micheal_jor" <open.zone at v...> wrote:

> One additional issue that I would like to introduce relates to
token 
> redefinition. How does one specify a custom ASTNodeType globally
for 
> terminals such as ID and PLUS that aren't originally defined in a 
> tokens {..} section?
> 
> We have tokens defined in the lexer (and therefore without 
> ASTNodeTypes) that were to be "importVocab'd" into the parser 
> (parsers actually). We planned to add the ASTNodeTypes (which could 
> be different for different parsers) in the token's section in the 
> parser. Can we use the tokens {...} construct to do this with 
> terminals like ID and PLUS?. Being forced to use per-TokenRef 
options 
> is very wasteful/verbose since it will be the same for all IDs.

That works now (2.7.1), except for tokens created by #[ ... ]

> 
> > In future if you say
> > 
> >      tokens {
> >          PLUS<AST=PLUSNode>;
> >          ...
> >      }
> > 
> >      then I'll make action #[PLUS] create the right node.  You can
> > also say
> >      #[ID,"foo","VarNode"] (3rd arg is the type of node to 
create).
> 
> I presume you meant that _both_ #[PLUS] and #[PLUS, "sometext"]
will 
> be fixed. 
> 
> I kinda like the extended syntax - I view it (and the per-tokenRef 
> option) as a sort of local override of the global 
> TokenType==>ASTNodeType mapping established with setASTNodeClass
and 
> tokens {...}. 
> 
> In our [informal] ANTLR coding standards, using "local override" 
> ASTNodeType constructs is the exception rather than the rule.
> 
> > 2. dup methods of ASTFactory don't respect the type of the nodes; 
it
> >      uses default node type.  In future, i'll use 
> > t.getClass().newInstance()
> >      to do the dup.
> > 
> 
> The dup() methods ultimately call the factory's create() method. 
Once 
> the factory is able to create the right nodes based on it's type, 
the 
> dup() methods should just work. At least that was the experience 
with 
> C#.

No, I had to fix that for adding tree construction syntax, and Ter is 
adopting my fix.  dupXX() should duplicate the AST node(s) exactly, 
not be created according to token type.  That fits with the
capability 
of specifying class when creating a token.

Hmm--Ter, there needs to be a policy for AST node typing during tree 
transformations.  Should tree walkers have their own type tables, and 
default to duplicating nodes unless there is a type/class mapping?  
Then AST node information is preserved unless the AST type is 
overridden.

> > 3. hetero tree construction does not call the factory.  E.g.,
> > 
> >      anIntRule : INT<AST=INTNode> ;
> > 
> >      generates
> > 
> >      INTNode v = new INTNode(LT(1));
> > 
> >      but we need to instead generate:
> > 
> >      AST v = (AST)astFactory.create(LT(1),"INTNode");
> > 
> >      where the create(...) method is new and specifies the type to
> >      create.  This will use newInstance() instead of "new" by 
> >      default.
> > 
> 
> This contradicts the "Heterogeneous AST" section of the reference 
> manual which states that "ANTLR uses the factory to create nodes
for 
> which it does not know the specific type". 
> 
> My opinion is that ANTLR should always use the ASTFactory except
for 
> (1) the new extended AST construction syntax and (2) the 
per-tokenRef 

Actually, Ter cleverly turns #[ XXX ] into factory.create( XXX ).

> ASTNodeType option since they effectively "override" the factory's 
> global view of Token==>ASTNodeType mappings specified with the 
> setASTNodeClass and the "tokens {...}" options.
> 
> I can't actually remember what policy has been (or is to be) 
> implemented in the C# codegen but, I remember that the pre-existing 
> mechanism for reading the grammar file and loading the various 
> options removed the distinction between per-Token and per-TokenRef 
> ASTNodeType settings for grammar atoms. The GrammarAtom simply has 
an 
> ASTNodeType attribute. 
> 
> So I guess for "all non-manual tree construction requests that 
> involve per-token or per-tokenref ASTNodeType options" the C# 
codegen 
> will (must?) always generate
>    INTNode v = new INTNode(LT(1));
> 
> The specified ASTNodeType may not be the type associated with the 
> TokenType in the ASTFactory's mapping table so it's safer to just 
> bypass the factory entirely.
> 
> Incidentally, the same will be true of the extended manual tree 
> construction syntax. 
> 
> > 4. If you define ID<AST=T> in tokens section then all code in 
> grammar 
> > "id:ID" should
> >      define labels as "T id" not "AST id" nor labelASTType id.
> 
> Hmmm. Interesting. I don't think either of the C++ and the C# 
codegen 
> do this. What would be the benefit?
> 
> Cheers,
> 
> Micheal

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/