[antlr-interest] Every grammar an output grammar

Wed Apr 2 14:50:16 PDT 2008

One of the problems that we hoped that ANTLR 3 would solve was that of text output--generating text from ANTLR 2 was an unpleasant experience.  StringTemplate helps considerably, and one of the inchoate ideas that Ter was grappling with at the ANTLR 3 cabal was that of an "output grammar" that would provide a mapping between an ANTLR grammar and a template group.  This idea never quite gelled; the ANTLR 3 "template" output options was a first attempt to go in this direction.

Last week's "template generation" discussion came just as I was finishing a pretty printer for Yggdrasil and started me thinking about the problem again.  I finally came up an approach that seems to provide a solution and have implemented that in Yggdrasil.  After trying other syntaxes, I ended up with an annotation that "mirrors" the template syntax:  a <foo> suffix assigns a token's text to a "foo" attribute, while <<bar>> references a bar template (fills a slot that holds slots; i. e:  a template).  This approach could easily be incorporated into the baseline ANTLR 3; the whole idea of output grammars seems to be a big step forward.  I have documented this for Yggdrasil as follows:

Output annotations and automated template
generationIf the grammar
option “buildText” is set to true, Yggdrasil will automatically
output templates according to the model:
String
	templates are decorated as trees mirroring the input elements of the
	grammar.  For each rule in a grammar, there may be a corresponding
	string template definition, either explicitly specified or (by
	default) with the same name as the rule.  Upon entry to a rule, the
	current template value is pushed onto a stack.  If there is a
	template for the rule, then an instance of that template is created
	and current is set with that instance; otherwise, the current
	template remains in effect.
	Unless
	otherwise specified, values are added to the “body” attribute of
	the current template.
	Grammar
	annotations for template building generally take the form of <key>
	or <<template>> suffixes.  A rule defined with the name
	foo<<bar>> has “bar” as the rule template (the
	syntax here is limited to a single argument).  A rule, token, or
	instantiated attribute reference of the form tok<t> assigns
	tok to the t attribute.  “<->” is the template equivalent
	of “!”.  Rule references have the additonal form
	ruleRef<<templateName>> or <key<templateName>>;
	templateName overrides the template that would have otherwise been
	invoked.  If templateName is “-”, then no template is created
	for the invoked rule and text items are added to the current
	template.  [So “<->” is “do not add text” and “<<->>”
	is “do not invoke a new template”.]
	Syntactic
	predicates build templates, and the recognizer class tracks the last
	syntactic predicate. Synpred templates are not added to the output
	template; they are tracked as an aid to debugging.
Within a rule,
“@$<>” references the active template and can be assigned
to a Text attribute.  Grammar attributes are not added to the output
template except through actions or through the attribute algebra.
Given this scheme,
any grammar can become an output grammar, and a variety of template
groups can be built in order to do such things as build text for
displaying a parse tree, build XML output, pretty print, and the
like.  Parse tree and XML output forms can be built automatically. 
To build a pretty printer or other customized output format, the
natural approach is to start with a parse tree format, then fill out
individual templates and annotate the grammar.  For a rule named
“rule”, one parse tree template is
rule(body) ::= <<
“rule”
	<body;
separator = “\n”>
>>
and this template
can be easily generated for all rules in a grammar.
Example output grammarThis
is a simplified subset of some definitions from antlr.g:

rule<<ruleDef>>
	:
	(
		DOC_COMMENT
	)?
	(
		"protected"
		|
			"public"
		|
			"private"
		|
			"fragment"
	)?
	(
		RULE_REF<name>
		|
			TOKEN_REF<name>
	)
	COLON
	block
	SEMI
	;

block
	:
	alternative
	(
OR<-> alternative<<orAlt>> )*
	;

and
this is the corresponding template set for a pretty printer:

ruleDef(name,
body) ::= <<
<name>
	:
	<body;
separator = "\n">
	;
>>

block(body,
suffix) ::= <<
(
	<body;
separator = "\n">
)<suffix>
>>

alternative(body)
::= <<
<body;
separator = "\n">
>>

orAlt(body)
::= <<
|
	<body;
separator = "\n">
>>

Note that a fairly minimal annotation of the .g file maps the
input onto a rich template set; that is, the development effort for
producing a pretty printer is focused primarily on building
templates, not on annotating the grammar.

      ____________________________________________________________________________________
You rock. That's why Blockbuster's offering you one month of Blockbuster Total Access, No Cost.  
http://tc.deals.yahoo.com/tc/blockbuster/text5.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080402/c309e3b3/attachment.html