[antlr-interest] terminology: "protected"

Gregg Reynolds dev at arabink.com
Thu Jan 12 03:02:55 PST 2006


Hi,

I'm not sure if this is the right place for it, but here's a suggestion 
to remedy the nomenclature of "protected" rules, which, as Mr. Parr 
notes, "sucks".  Actually it doesn't seem all that sucky to me, but I 
think a better term is "splice".

Also I suggest a splop (splicing operator):  ','.  That's right, from 
our ol' pal Lisp.  For antlr, I propose that splicing means essentially 
macro expansion without tokenization.  For example,

if FOO = X Y Z then

	A ,FOO C  =>  A X Y Z C

Same result for FOO = (X Y Z); i.e. splicing "opens" a list and merges 
its contents.

Example from the Ref Man:

STRING: '"' ( ESCAPE | ~('"'|'\\') )* '"' ;

protected
ESCAPE
     :    '\\'
          ( 'n' { $setText("\n"); }
          | 'r' { $setText("\r"); }
          | 't' { $setText("\t"); }
          | '"' { $setText("\""); }
          )
     ;

This use would become simply:

STRING: '"' ( ,ESCAPE | ~('"'|'\\') )* '"' ;

which tells the reader explicitly that we're dealing with a 
non-tokenizing splice (no need to look up the defn).

The defn becomes

,ESCAPE : ...   or  splicer ESCAPE : ...   or the like.  I rather prefer 
the first alternative.

This gives us a nice metalanguage:  if

	A = B ,C D

then we can say that A _splices_ C, and _delegates to_ B and D. 
"Delegates" instead of "calls", since grammar productions are not 
functions, and it is highly desireable (IMHO) to avoid dependency on 
both implementation language (Java) and procedural language in general 
in a metalanguage for grammars.  (FWIW, this all springs from a 
nearly-complete attempt to express a kind of abstract syntax and formal 
semantics of a language corresponding to antlr and StringTemplate. 
Which e.g. removes dependency on Java, object-oriented stuff, procedural 
thinking, etc. )

This (possibly) has the additional virtue of grammatical unification, 
since the same concept and notation works for Antrl "standard" grammars 
as noted above, StringTemplates, and maybe also Tree grammars.

For templates,  "abc $foo() def" becomes "abc ,foo def", where ",foo" 
has the same meaning as above: syntactically splice (macro expand) foo 
in place.  A big win (IMO) is that this eliminates the current reliance 
of notions of function calling.  Here's an example from the ST refman:

<html>
<body>
...
$searchbox()$
...
</body>
</html>

With splop this becomes

<html>
<body>
...
,searchbox
...
</body>
</html>

Nice a clean.  No fuss no muss, and no alien notion of calling a 
function.  (Obviously more detail is required, e.g. param handling, but 
I'll save that for a page to be posted to the web soonish.  Or laterish.)

For trees its a bit hairier, since it isn't clear what it means to 
splice a subtree into a tree.  Or, the meaning isn't always intuitively 
obvious, as it is in the case of string (list) splicing.

For example, splicing into a list qua list is obvious (see Lisp), but if 
the list is construed as a tree representation, it isn't so obvious 
sometimes.  Consider

	(A ,(B C D) E)

Syntactic (lexical?) splice results in (A B C D E); simply a list merge. 
   But if these lists represent trees, such a lexical merge has the 
effect of hoisting the children of B to become its siblings.  Probably 
not the intention, or at least a violation of the Priciple of Least 
Surprise.  I haven't figured out yet exactly how splicing should be 
construed in a tree grammar.

Some other questions:  what does (A (B ,C D) E) mean?  (A ,(B ,C D) E)?

Apologies for going on so long.  I have just one more proposal:  call 
the tree grammar operator '^' the "hoisting operator", and change the 
semantics to eliminate surprises.  Specifically, scope hoisting 
semantics to the local environment by default.  E.g.

	expr MULT^ (ID EXPONENT^ expr)

should produce MULT expr (EXPONENT ID expr).  Possible objection:  what 
about ID ( PLUS^ ID )* ??  Possible answer: first, each group is matched 
in a new local environment; then, to hoist the PLUS out of the local 
scope, we simply use our new friend SPLOP:

	ID ( ,PLUS^ ID)*

The splice operator moves (hoists) PLUS outside of the local scope, and 
the hoisting op moves it to the root of its new local scope.  The parens 
ensure retain their meaning for matching purposes.  (Just an idea; I 
haven't thought it all the way through yet.)

Whew.  Rather more detail than I realized - it all seems so short and 
simple in my head.  Most of the above I've thought through and I think 
it works, but you may find some holes.  I hope somebody finds this 
interesting/useful.

Thanks for your indulgence,

gregg reynolds

P.S.  I make no claim as to the practicality of implementing the above. 
  That's what programmers are for.  ;)

P.P.S.  multitudinous thanks to the Supreme Dictator for gracing a 
heartless and indifferent world with the blessing of Antlr+ST.  My brain 
has been quite inflamed with visions of languages and grammars and such 
for the past two weeks or so - I begin to fear Human Spontaneous 
Combustion.  Not so much because Antlr+ST is useful (I do plan to use 
it), but because there's so much food for thought there - it fired my 
imagination.  (Now I've got approx 2 million ideas of which approx 9 are 
good, plus or minus 7.)  The insight that parse grammars and output 
templates are essentially the same is quite brilliant and startling (for 
me at least).  So I've begun to think about both in completely different 
(and deeper) ways thanks to Antlr+ST, which is just about the highest 
complement one can give.

But you have to come up with something better than "StringTemplate".  No 
sizzle to that at all.  Maybe "astre"?  "Another String Template $R 
Engine", where R is up to the reader - Resovler, Regurgitator, etc.  Or 
"Another String Template Rubbish Extruder"?



More information about the antlr-interest mailing list