[antlr-interest] Every grammar an output grammar
Loring Craymer
lgcraymer at yahoo.com
Wed Apr 2 14:50:16 PDT 2008
One of the problems that we hoped that ANTLR 3 would solve was that of text output--generating text from ANTLR 2 was an unpleasant experience. StringTemplate helps considerably, and one of the inchoate ideas that Ter was grappling with at the ANTLR 3 cabal was that of an "output grammar" that would provide a mapping between an ANTLR grammar and a template group. This idea never quite gelled; the ANTLR 3 "template" output options was a first attempt to go in this direction.
Last week's "template generation" discussion came just as I was finishing a pretty printer for Yggdrasil and started me thinking about the problem again. I finally came up an approach that seems to provide a solution and have implemented that in Yggdrasil. After trying other syntaxes, I ended up with an annotation that "mirrors" the template syntax: a <foo> suffix assigns a token's text to a "foo" attribute, while <<bar>> references a bar template (fills a slot that holds slots; i. e: a template). This approach could easily be incorporated into the baseline ANTLR 3; the whole idea of output grammars seems to be a big step forward. I have documented this for Yggdrasil as follows:
Output annotations and automated template
generationIf the grammar
option “buildText” is set to true, Yggdrasil will automatically
output templates according to the model:
String
templates are decorated as trees mirroring the input elements of the
grammar. For each rule in a grammar, there may be a corresponding
string template definition, either explicitly specified or (by
default) with the same name as the rule. Upon entry to a rule, the
current template value is pushed onto a stack. If there is a
template for the rule, then an instance of that template is created
and current is set with that instance; otherwise, the current
template remains in effect.
Unless
otherwise specified, values are added to the “body” attribute of
the current template.
Grammar
annotations for template building generally take the form of <key>
or <<template>> suffixes. A rule defined with the name
foo<<bar>> has “bar” as the rule template (the
syntax here is limited to a single argument). A rule, token, or
instantiated attribute reference of the form tok<t> assigns
tok to the t attribute. “<->” is the template equivalent
of “!”. Rule references have the additonal form
ruleRef<<templateName>> or <key<templateName>>;
templateName overrides the template that would have otherwise been
invoked. If templateName is “-”, then no template is created
for the invoked rule and text items are added to the current
template. [So “<->” is “do not add text” and “<<->>”
is “do not invoke a new template”.]
Syntactic
predicates build templates, and the recognizer class tracks the last
syntactic predicate. Synpred templates are not added to the output
template; they are tracked as an aid to debugging.
Within a rule,
“@$<>” references the active template and can be assigned
to a Text attribute. Grammar attributes are not added to the output
template except through actions or through the attribute algebra.
Given this scheme,
any grammar can become an output grammar, and a variety of template
groups can be built in order to do such things as build text for
displaying a parse tree, build XML output, pretty print, and the
like. Parse tree and XML output forms can be built automatically.
To build a pretty printer or other customized output format, the
natural approach is to start with a parse tree format, then fill out
individual templates and annotate the grammar. For a rule named
“rule”, one parse tree template is
rule(body) ::= <<
“rule”
<body;
separator = “\n”>
>>
and this template
can be easily generated for all rules in a grammar.
Example output grammarThis
is a simplified subset of some definitions from antlr.g:
rule<<ruleDef>>
:
(
DOC_COMMENT
)?
(
"protected"
|
"public"
|
"private"
|
"fragment"
)?
(
RULE_REF<name>
|
TOKEN_REF<name>
)
COLON
block
SEMI
;
block
:
alternative
(
OR<-> alternative<<orAlt>> )*
;
and
this is the corresponding template set for a pretty printer:
ruleDef(name,
body) ::= <<
<name>
:
<body;
separator = "\n">
;
>>
block(body,
suffix) ::= <<
(
<body;
separator = "\n">
)<suffix>
>>
alternative(body)
::= <<
<body;
separator = "\n">
>>
orAlt(body)
::= <<
|
<body;
separator = "\n">
>>
Note that a fairly minimal annotation of the .g file maps the
input onto a rich template set; that is, the development effort for
producing a pretty printer is focused primarily on building
templates, not on annotating the grammar.
____________________________________________________________________________________
You rock. That's why Blockbuster's offering you one month of Blockbuster Total Access, No Cost.
http://tc.deals.yahoo.com/tc/blockbuster/text5.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080402/c309e3b3/attachment.html
More information about the antlr-interest
mailing list