[antlr-interest] Strange ANTLR behavior when using heterogeneous ASTs

Andrey R. Urazov a_urazov at mail.ru
Mon Apr 26 10:36:21 PDT 2004


On Mon, Apr 26, 2004 at 04:10:48PM +0200, Ric Klaren wrote:
> Looks like an oversight in the code that generates the initialization
> stuff. The heterogeneous stuff was never without headaches.

Maybe this is an oversight but it looks overcomplicated for that. Seems
that I got how the algorithm works. Let's look at a slightly modified
`heteroAST' example from the distribution:


tokens {
	PLUS<AST=PLUSNode>;
	STAR<AST=MULTNode>;
	INT;
	SEMI;
}

expr
	:	mexpr (PLUS^ mexpr)* SEMI!
	;

mexpr
	:	atom (STAR^ PLUS atom)*
	;

atom:	INT <AST=INTNode>		// also possible in tokens section
	;

Here I added one more PLUS after the STAR. Now let's see what we get:

void CalcParser::initializeASTFactory( ANTLR_USE_NAMESPACE(antlr)ASTFactory& factory )
{
	factory.registerFactory(4, "PLUSNode", PLUSNode::factory);
	factory.registerFactory(5, "MULTNode", MULTNode::factory);
	factory.registerFactory(4, "PLUSNode", PLUSNode::factory);
	factory.registerFactory(6, "INTNode", INTNode::factory);
	factory.setMaxNodeType(11);
}

Registration line has been put twice in the code. So, the algorithm
tracks all the occurences of the tokens with defined custom ASTs in the
grammar and for each occurence generates a corresponding line.

In my opinion it's overcomplicated. What needs to be done is the
UNCONDITIONAL generation of a registration line for EACH custom AST
description in the tokens section. Unconditional because it's impossible
to track all the uses of a particular token --- the user might, for
instance, not rely on ANTLR special constructs but rather call create
method on a factory object directly (and even do it outside of the .g
file).

So, not only it's logical to assume that if the user did specify custom
AST for some token, they wanted the binding to be fixed throughout the
grammar, but also UNCONDITIONAL generation is the only implementable
solution as far as we want real and imaginary tokens to have equal
rights.  The last seems right and natural to me, but I'm still doubting
whether that was the initial intention.

The following is an excerpt from the ANTLR documention:

 In the grammar, you can override the default class type by setting the
 type for nodes created from a particular INPUT token.

The word INPUT which I capitalized is the source of my doubt.


Best regards,
  Andrey Urazov



 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/
 



More information about the antlr-interest mailing list