[antlr-interest] Suggestion: Parameterized Productions
Austin Hastings
Austin_Hastings at Yahoo.com
Sat Oct 6 07:38:03 PDT 2007
This is a feature request/suggestion:
In a language like C, there are type specifications and then anonymous
type specifications:
extern int strcpy(char *, const char *); /* Anonymous types used for
parameter info */
int strcpy(char * to, const char * from) /* Explicit declarations
required for function definition. */
{
char *ret = to;
while (*to++ = *from++);
return ret;
}
The problem that these cause is that the grammar will have a rather
complex section for recognizing declarations, and then a second,
parallel, section for recognizing anonymous declarations:
declarator
: pointer? direct_declarator
| pointer
;
abstract_declarator
: pointer direct_abstract_declarator?
| direct_abstract_declarator
;
direct_declarator
: ( IDENTIFIER
| '(' declarator ')'
)
declarator_suffix*
;
direct_abstract_declarator
: ( '(' abstract_declarator ')' | abstract_declarator_suffix ) abstract_declarator_suffix*
;
(Excerpts from Terence's ANSI-C grammar.)
There are four related-but-slightly-different production trees in the C
grammar for simple variables, structures, parameters, and sizeof.
I propose that there be a parameterized production mechanism added so
that this can be eliminated. The antlr syntax would probably resemble
the parameter mechanism for generated functions. The difference would be
that these parameters would be handled internally by antlr, rather than
passed into the generated code.
The point would be to attach production subtrees, including empty ones,
within rules. This would allow re-use of the structures with a smaller
set of customized rules needed to handle special cases.
Part of the customized rules would need to be a way to recognize
"context." This compares to the existing pattern of using "if
($declaration.size>0)" to determine when a parent rule is active. It
seems there are two cases here: detection at generation-time of the path
to a node, and detection at run-time of the path to a node.
Currently, the same ANSI-C grammar includes this snippet:
direct_declarator
: ( IDENTIFIER
{
if ($declaration.size()>0&&$declaration::isTypedef) {
$Symbols::types.add($IDENTIFIER.text);
System.out.println("define type "+$IDENTIFIER.text);
}
}
| '(' declarator ')'
)
declarator_suffix*
;
The code that checks $declaration.size() is checking the generation-type
path to the node, but doing it at run-time. The next condition, &&
$declaration::isTypedef, is checking a run-time indicator of which case
inside the declaration production is being followed.
The difference is that the production for a function declaration makes
reference to a declarator as part of its syntactic predicate- the only
difference between a declaration and a function definition is the
opening '{' versus the closing ';'.
Being able to ask in the grammar about the path to the node may or may
not affect code generation - I'm not smart enough to say for sure. But
it does offer the opportunity to separate compile-time (generation) from
run-time decisions, in a hopefully target-language-independent fashion.
And it simplifies the custom nodes that will be needed for parameterized
productions.
In fact, being able to query the path to a node in the grammar may
eliminate the need for the particular parameterization mentioned here.
If the final node - "direct_declarator" - could choose from among the
various paths, then the entire logic might be able to fit in that node.
I'm not sure it would improve readability, though.
Thanks,
=Austin
More information about the antlr-interest
mailing list