[antlr-interest] terminology: "protected"
dev at arabink.com
Thu Jan 12 07:17:33 PST 2006
Martin Probst wrote:
> Just an example:
> FOO: "abc" BAR;
> BAR: "def";
> FOO: "abc" BAR;
> protected BAR: "def";
> FOO: "abc" "def";
> (protected) BAR: "def";
> All of these will currently result in a single FOO token containing
> "abcdef" on input "abcdef". There is no observable difference to the
> user, except for non-determinism problems if something else than BAR can
> match "def".
Yes, but what if you want to mess about with (sub) matched elements
within a rule action? With A : B C you can reference B and C, even if
they have a internal (sub) structure. But with A : ,B C you can't
reference B, only its "contents". Can A's action reference names in B
or C? Or only the names 'B' and 'C'? Which is to say that in a sense
its all about naming.
(At this point I guess I should remind you that this design emerged from
my thinking about a more general parser/template language, and may or
may not clash with the antlr implementation. But I think the language
and concept present at least an interesting possibility for Antlr.)
>>But if you delegate, you first
>>do the matches (in a new local scope), then tokenize the result (bind it
>>to a single token) which is inserted. So the components are *not*
>>inserted into the syntax.
> As far as I know if you "delegate", e.g. do not have a protected rule,
> it does not return a Token instead of a String or something - there is
> also no (Java) implementation difference, but that should not matter to
> the user anyways.
See the article on protected rules at
http://www.jguru.com/faq/view.jsp?EID=125: "...'You get a Token object
for every lexical rule in your lexer grammar.' This is indeed the
default case for ANTLR's lexer grammars... To distinguish these "helper"
rules from rules that result in tokens, use the protected modifier."
Mr. Parr chose "protected" by analogy: it controls access visibility in
Java, and in Antlr "helper" rules (protected ones) are not "seen" by the
parser - they don't get "hoisted" into the nextToken logic. That may be
true, but I argue that it's not what really counts. It's the token
binding that matters.
Unfortunately, in the same article is the statements "I now recognize
this approach is a mistake. I have a number of proposals to fix
this..." So maybe version 3 does something completely different; I
haven't had time to look at v3.
And later: "By definition, all lexical rules return Token objects
(ANTLR optimizes away many of these object creations, however), but only
the Token objects of non-protected rules get pulled out of the lexer
itself." I.e. lexer and parser stuff is treated differently? I'm not
sure of the exact implications of this, but it seems to me that
"splicing" captures the same idea (or addresses the same issue).
(I saw your note below about using "tokenizing" in this sense. I agree,
it isn't the right term for my meaning. Until I can think of better
metalanguage, I'll surrender and say "non-splicing rule" instead of
"tokenizing rule". ;)
> I'm not sure what you're referring to with local scope, but if mean that
> a "spliced" rule should be able to access stuff from the scope of the
> "calling" rule, then this is -sorry- pure madness. Macros are evil! A
Right, I agree with that. The question being "Can an action attached to
a spliced rule make reference to anything outside of the spliced rule
itself?", right? Well now, that's kind of interesting, since ordinary
macros in other languages don't consist of a pair of (rule, action). So
what is the naturally intuitive way to interpret a spliced rule with
actions attached? To me it is: splice the rule's grammar production
syntactically but still restrict the action to the local scope (the one
in which the action is defined). In other words, scope things exactly
the same way as with non-spliced rules, but then "unpack" the list of
matches and insert them into the splicing ("calling") rule.
>>Internal/external, hidden/exposed, etc. - that's all distinct from the
>>core issue of splicing v. tokenizing, no?
> Well, I'm arguing that there is no splicing issue, just an
> internal/external issue.
>> Which might be construed more
>>usefully as syntactic v. semantic splicing. I suppose one might argue
>>that the internal/external distinction is itself an irrelevant
>>implementation detail - what counts is tokenizing.
> Internal vs. External has a huge impact, e.g.
> FOO: "abc" BAR;
> BAR: "def";
> BAZ: "def";
> does not work, as BAR and BAZ are identical.
Add "protected" or
> "internal" or whatever to BAR, and it will work. The example is a bit
> contrived, but you get the idea.
Ok, agreed. So as I see it, there are two (orthogonal?) issues: 1) how
to specify that a rule is not to be included in the top-level nextToken
logic (ie. is protected/inner/hidden/etc.); and 2) how to indicate that
a reference to a rule should be either replaced by the rule grammar text
(spliced), or bound to a structure of tokens (terminology to be determined.)
For (1), I agree we need a decoration of some kind on the name
definition. I rather like "hidden", but there might be something
better. Note that a "hidden" rule need not be used for splicing. For
(2), the splice operator allows the client rule to control the
tokenizing of the referenced rule, i.e. whether it's results are bound
to the reference name.
In any case, thanks very much for the feedback - you've given me lots to
think about. I expect to write up something more formal about all this
stuff before too long. There are other aspects of subtemplate
referencing that I haven't addressed, most obviously passing parameters.
That I construe as a renaming operation - function calling without the
function call; and once you have the logic and syntax for renaming, you
get very powerful template reuse capabilities.
BTW, don't forget that part of the motivation is to unify the language
across Antlr grammars and StringTemplate, and part is to remove
programmer-speak from the language definition. The notion of splicing
works quite well in ST, and eliminates the need for method call syntax:
"foo $bar()$ baz" becomes "foo ,bar baz".
And since parse grammars and output templates are fundamentally the same
animal - grammars - we should be able to use the same operations and
symbols in each.
More information about the antlr-interest