[antlr-interest] terminology: "protected"

Thu Jan 12 04:28:00 PST 2006

Martin Probst wrote:
> I agree that "protected" is a bad wording, but I don't quite agree with
> your splicing thing too. Basically your requiring the user to specify
> within the rule that is using the "spliced" rule that the rule is
> spliced. 

Hello Martin,

Thanks for the feedback; comments below.
> 
>>which tells the reader explicitly that we're dealing with a 
>>non-tokenizing splice (no need to look up the defn).
> 
> 
> I don't see why that would be desirable. To me, a "spliced" rule is
> nothing special, except that it's internal to the lexer (maybe that
> would be a much nicer wording?), and cannot be directly accessed from
> the parser.

I guess it depends on what you mean by "special".  If I understand antlr 
correctly, "protected" rules are not merely internal: most importantly 
"evaluation" (so to speak) works differently for spliced v. tokenizing 
rules.  Spliced rules don't tokenize.  By "tokenizing" I mean the result 
of matching the subrule is bound to a token rather than being spliced. 
So one could also specify a difference in scoping, e.g. matching of a 
tokenizing subrule occurs in a local scope, whereas a spliced subrule is 
matched in the scope of the rule that does the splicing.

This is basically a variation on how macros and splicing works in 
Lisp/Scheme, which is why it is so useful.  Much of its usefulness is 
simply that it makes the language more expressive.

In fact, with splicing notation, we could allow a rule to be both 
tokenizing and non-tokenizing.  E.g.

	,A = ....  // splicing (non-tokenizing) and internal only
	FOO = (X Y Z)  // bisemous(?): hoisted to nextToken, but may be spliced too
	BAR = B FOO C    // delegation to tokenizing rule FOO
	BAZ = I ,FOO J   // splice non-tokenizing FOO
	BUZ = Q A R	// error? delegation to splice-only rule

  The calling rule doesn't care about the fact, it just says
> "and at this point, I need the stuff defined in FOO".

It makes a difference whether or not "the stuff defined in FOO" is 
"tokenized" or spliced.  That's why IMO an explicit splicing operator is 
a good thing.  It allows the designer a degree of control over 
tokenizing, and it helps the reader understand the intention of the writer.

> 
> Plus: it does not have the properties of a macro expansion, does it? And
> if so: who cares? That's IMHO an implementation detail - from the
> language point of view it's just a sequence of matches.

Well, my thinking about "splice" and "macro" is inspired by scheme, but 
isn't quite the same.  I think of splicing in antlr ("calling" a 
protected subtemplate) as equivalent to referencing a #defined 
identifier in C.  (This may not be entirely accurate, but it's my 
understanding of the 2.7.5 documentation.)  There's a big difference 
indeed between expanding a macro and evaluation an expression.  If you 
splice a rule (expand a macro), you first construct the text by merging 
(a purely syntactic operation, no matching/eval/etc.), and the result is 
another piece of grammar to be matched.  But if you delegate, you first 
do the matches (in a new local scope), then tokenize the result (bind it 
to a single token) which is inserted.  So the components are *not* 
inserted into the syntax.

In a word, you end up with different token structures depending on 
whether you splice or delegate.  At least that's the idea.

> 
> I'd propose to call those rules "internal" - as stated, they cannot be
> directly accessed from outside of the Lexer (in the rule matching
> meaning) and "internal" also expresses the Java protected behaviour.

Internal/external, hidden/exposed, etc. - that's all distinct from the 
core issue of splicing v. tokenizing, no?  Which might be construed more 
usefully as syntactic v. semantic splicing.  I suppose one might argue 
that the internal/external distinction is itself an irrelevant 
implementation detail - what counts is tokenizing.

Thanks,

gregg