[antlr-interest] Suggestion: Parameterized Productions

Austin Hastings Austin_Hastings at Yahoo.com
Sat Oct 6 07:38:03 PDT 2007


This is a feature request/suggestion:

In a language like C, there are type specifications and then anonymous 
type specifications:

extern int strcpy(char *, const char *);    /* Anonymous types used for 
parameter info */

int strcpy(char * to, const char * from)  /* Explicit declarations 
required for function definition. */
{
    char *ret = to;
    while (*to++ = *from++);
    return ret;
}

The problem that these cause is that the grammar will have a rather 
complex section for recognizing declarations, and then a second, 
parallel, section for recognizing anonymous declarations:

declarator
	: pointer? direct_declarator
	| pointer
	;

abstract_declarator
	: pointer direct_abstract_declarator?
	| direct_abstract_declarator
	;

direct_declarator
	: ( IDENTIFIER
	  | '(' declarator ')'
	  )
        declarator_suffix*
	;

direct_abstract_declarator
	:	( '(' abstract_declarator ')' | abstract_declarator_suffix ) abstract_declarator_suffix*
	;

 
(Excerpts from Terence's ANSI-C grammar.)

There are four related-but-slightly-different production trees in the C 
grammar for simple variables, structures, parameters, and sizeof.

I propose that there be a parameterized production mechanism added so 
that this can be eliminated. The antlr syntax would probably resemble 
the parameter mechanism for generated functions. The difference would be 
that these parameters would be handled internally by antlr, rather than 
passed into the generated code.

The point would be to attach production subtrees, including empty ones, 
within rules. This would allow re-use of the structures with a smaller 
set of customized rules needed to handle special cases.

Part of the customized rules would need to be a way to recognize 
"context." This compares to the existing pattern of using "if 
($declaration.size>0)" to determine when a parent rule is active. It 
seems there are two cases here: detection at generation-time of the path 
to a node, and detection at run-time of the path to a node.
 
Currently, the same ANSI-C grammar includes this snippet:

direct_declarator
	:   (	IDENTIFIER
			{
			if ($declaration.size()>0&&$declaration::isTypedef) {
				$Symbols::types.add($IDENTIFIER.text);
				System.out.println("define type "+$IDENTIFIER.text);
			}
			}
		|	'(' declarator ')'
		)
        declarator_suffix*
	;

The code that checks $declaration.size() is checking the generation-type 
path to the node, but doing it at run-time. The next condition, && 
$declaration::isTypedef, is checking a run-time indicator of which case 
inside the declaration production is being followed.

The difference is that the production for a function declaration makes 
reference to a declarator as part of its syntactic predicate- the only 
difference between a declaration and a function definition is the 
opening '{' versus the closing ';'.

Being able to ask in the grammar about the path to the node may or may 
not affect code generation - I'm not smart enough to say for sure. But 
it does offer the opportunity to separate compile-time (generation) from 
run-time decisions, in a hopefully target-language-independent fashion. 
And it simplifies the custom nodes that will be needed for parameterized 
productions.

In fact, being able to query the path to a node in the grammar may 
eliminate the need for the particular parameterization mentioned here. 
If the final node - "direct_declarator" - could choose from among the 
various paths, then the entire logic might be able to fit in that node. 
I'm not sure it would improve readability, though.

Thanks,

=Austin




More information about the antlr-interest mailing list