[antlr-interest] terminology: "protected"
Gregg Reynolds
dev at arabink.com
Thu Jan 12 03:02:55 PST 2006
Hi,
I'm not sure if this is the right place for it, but here's a suggestion
to remedy the nomenclature of "protected" rules, which, as Mr. Parr
notes, "sucks". Actually it doesn't seem all that sucky to me, but I
think a better term is "splice".
Also I suggest a splop (splicing operator): ','. That's right, from
our ol' pal Lisp. For antlr, I propose that splicing means essentially
macro expansion without tokenization. For example,
if FOO = X Y Z then
A ,FOO C => A X Y Z C
Same result for FOO = (X Y Z); i.e. splicing "opens" a list and merges
its contents.
Example from the Ref Man:
STRING: '"' ( ESCAPE | ~('"'|'\\') )* '"' ;
protected
ESCAPE
: '\\'
( 'n' { $setText("\n"); }
| 'r' { $setText("\r"); }
| 't' { $setText("\t"); }
| '"' { $setText("\""); }
)
;
This use would become simply:
STRING: '"' ( ,ESCAPE | ~('"'|'\\') )* '"' ;
which tells the reader explicitly that we're dealing with a
non-tokenizing splice (no need to look up the defn).
The defn becomes
,ESCAPE : ... or splicer ESCAPE : ... or the like. I rather prefer
the first alternative.
This gives us a nice metalanguage: if
A = B ,C D
then we can say that A _splices_ C, and _delegates to_ B and D.
"Delegates" instead of "calls", since grammar productions are not
functions, and it is highly desireable (IMHO) to avoid dependency on
both implementation language (Java) and procedural language in general
in a metalanguage for grammars. (FWIW, this all springs from a
nearly-complete attempt to express a kind of abstract syntax and formal
semantics of a language corresponding to antlr and StringTemplate.
Which e.g. removes dependency on Java, object-oriented stuff, procedural
thinking, etc. )
This (possibly) has the additional virtue of grammatical unification,
since the same concept and notation works for Antrl "standard" grammars
as noted above, StringTemplates, and maybe also Tree grammars.
For templates, "abc $foo() def" becomes "abc ,foo def", where ",foo"
has the same meaning as above: syntactically splice (macro expand) foo
in place. A big win (IMO) is that this eliminates the current reliance
of notions of function calling. Here's an example from the ST refman:
<html>
<body>
...
$searchbox()$
...
</body>
</html>
With splop this becomes
<html>
<body>
...
,searchbox
...
</body>
</html>
Nice a clean. No fuss no muss, and no alien notion of calling a
function. (Obviously more detail is required, e.g. param handling, but
I'll save that for a page to be posted to the web soonish. Or laterish.)
For trees its a bit hairier, since it isn't clear what it means to
splice a subtree into a tree. Or, the meaning isn't always intuitively
obvious, as it is in the case of string (list) splicing.
For example, splicing into a list qua list is obvious (see Lisp), but if
the list is construed as a tree representation, it isn't so obvious
sometimes. Consider
(A ,(B C D) E)
Syntactic (lexical?) splice results in (A B C D E); simply a list merge.
But if these lists represent trees, such a lexical merge has the
effect of hoisting the children of B to become its siblings. Probably
not the intention, or at least a violation of the Priciple of Least
Surprise. I haven't figured out yet exactly how splicing should be
construed in a tree grammar.
Some other questions: what does (A (B ,C D) E) mean? (A ,(B ,C D) E)?
Apologies for going on so long. I have just one more proposal: call
the tree grammar operator '^' the "hoisting operator", and change the
semantics to eliminate surprises. Specifically, scope hoisting
semantics to the local environment by default. E.g.
expr MULT^ (ID EXPONENT^ expr)
should produce MULT expr (EXPONENT ID expr). Possible objection: what
about ID ( PLUS^ ID )* ?? Possible answer: first, each group is matched
in a new local environment; then, to hoist the PLUS out of the local
scope, we simply use our new friend SPLOP:
ID ( ,PLUS^ ID)*
The splice operator moves (hoists) PLUS outside of the local scope, and
the hoisting op moves it to the root of its new local scope. The parens
ensure retain their meaning for matching purposes. (Just an idea; I
haven't thought it all the way through yet.)
Whew. Rather more detail than I realized - it all seems so short and
simple in my head. Most of the above I've thought through and I think
it works, but you may find some holes. I hope somebody finds this
interesting/useful.
Thanks for your indulgence,
gregg reynolds
P.S. I make no claim as to the practicality of implementing the above.
That's what programmers are for. ;)
P.P.S. multitudinous thanks to the Supreme Dictator for gracing a
heartless and indifferent world with the blessing of Antlr+ST. My brain
has been quite inflamed with visions of languages and grammars and such
for the past two weeks or so - I begin to fear Human Spontaneous
Combustion. Not so much because Antlr+ST is useful (I do plan to use
it), but because there's so much food for thought there - it fired my
imagination. (Now I've got approx 2 million ideas of which approx 9 are
good, plus or minus 7.) The insight that parse grammars and output
templates are essentially the same is quite brilliant and startling (for
me at least). So I've begun to think about both in completely different
(and deeper) ways thanks to Antlr+ST, which is just about the highest
complement one can give.
But you have to come up with something better than "StringTemplate". No
sizzle to that at all. Maybe "astre"? "Another String Template $R
Engine", where R is up to the reader - Resovler, Regurgitator, etc. Or
"Another String Template Rubbish Extruder"?
More information about the antlr-interest
mailing list