[antlr-interest] terminology: "protected"
Gregg Reynolds
dev at arabink.com
Thu Jan 12 04:28:00 PST 2006
Martin Probst wrote:
> I agree that "protected" is a bad wording, but I don't quite agree with
> your splicing thing too. Basically your requiring the user to specify
> within the rule that is using the "spliced" rule that the rule is
> spliced.
Hello Martin,
Thanks for the feedback; comments below.
>
>>which tells the reader explicitly that we're dealing with a
>>non-tokenizing splice (no need to look up the defn).
>
>
> I don't see why that would be desirable. To me, a "spliced" rule is
> nothing special, except that it's internal to the lexer (maybe that
> would be a much nicer wording?), and cannot be directly accessed from
> the parser.
I guess it depends on what you mean by "special". If I understand antlr
correctly, "protected" rules are not merely internal: most importantly
"evaluation" (so to speak) works differently for spliced v. tokenizing
rules. Spliced rules don't tokenize. By "tokenizing" I mean the result
of matching the subrule is bound to a token rather than being spliced.
So one could also specify a difference in scoping, e.g. matching of a
tokenizing subrule occurs in a local scope, whereas a spliced subrule is
matched in the scope of the rule that does the splicing.
This is basically a variation on how macros and splicing works in
Lisp/Scheme, which is why it is so useful. Much of its usefulness is
simply that it makes the language more expressive.
In fact, with splicing notation, we could allow a rule to be both
tokenizing and non-tokenizing. E.g.
,A = .... // splicing (non-tokenizing) and internal only
FOO = (X Y Z) // bisemous(?): hoisted to nextToken, but may be spliced too
BAR = B FOO C // delegation to tokenizing rule FOO
BAZ = I ,FOO J // splice non-tokenizing FOO
BUZ = Q A R // error? delegation to splice-only rule
The calling rule doesn't care about the fact, it just says
> "and at this point, I need the stuff defined in FOO".
It makes a difference whether or not "the stuff defined in FOO" is
"tokenized" or spliced. That's why IMO an explicit splicing operator is
a good thing. It allows the designer a degree of control over
tokenizing, and it helps the reader understand the intention of the writer.
>
> Plus: it does not have the properties of a macro expansion, does it? And
> if so: who cares? That's IMHO an implementation detail - from the
> language point of view it's just a sequence of matches.
Well, my thinking about "splice" and "macro" is inspired by scheme, but
isn't quite the same. I think of splicing in antlr ("calling" a
protected subtemplate) as equivalent to referencing a #defined
identifier in C. (This may not be entirely accurate, but it's my
understanding of the 2.7.5 documentation.) There's a big difference
indeed between expanding a macro and evaluation an expression. If you
splice a rule (expand a macro), you first construct the text by merging
(a purely syntactic operation, no matching/eval/etc.), and the result is
another piece of grammar to be matched. But if you delegate, you first
do the matches (in a new local scope), then tokenize the result (bind it
to a single token) which is inserted. So the components are *not*
inserted into the syntax.
In a word, you end up with different token structures depending on
whether you splice or delegate. At least that's the idea.
>
> I'd propose to call those rules "internal" - as stated, they cannot be
> directly accessed from outside of the Lexer (in the rule matching
> meaning) and "internal" also expresses the Java protected behaviour.
Internal/external, hidden/exposed, etc. - that's all distinct from the
core issue of splicing v. tokenizing, no? Which might be construed more
usefully as syntactic v. semantic splicing. I suppose one might argue
that the internal/external distinction is itself an irrelevant
implementation detail - what counts is tokenizing.
Thanks,
gregg
More information about the antlr-interest
mailing list