[antlr-interest] terminology: "protected"

Thu Jan 12 07:17:33 PST 2006

Martin Probst wrote:
> 
...
> 
> Just an example:
> FOO: "abc" BAR;
> BAR: "def";
> 
> vs.
> 
> FOO: "abc" BAR;
> protected BAR: "def";
> 
> vs.
> 
> FOO: "abc" "def";
> (protected) BAR: "def";
> 
> All of these will currently result in a single FOO token containing
> "abcdef" on input "abcdef". There is no observable difference to the
> user, except for non-determinism problems if something else than BAR can
> match "def".
> 

Yes, but what if you want to mess about with (sub) matched elements 
within a rule action?  With A : B C you can reference B and C, even if 
they have a internal (sub) structure.  But with A : ,B C you can't 
reference B, only its "contents".  Can A's action reference names in B 
or C?  Or only the names 'B' and 'C'?  Which is to say that in a sense 
its all about naming.

(At this point I guess I should remind you that this design emerged from 
my thinking about a more general parser/template language, and may or 
may not clash with the antlr implementation.  But I think the language 
and concept present at least an interesting possibility for Antlr.)

> 
>>But if you delegate, you first 
>>do the matches (in a new local scope), then tokenize the result (bind it 
>>to a single token) which is inserted.  So the components are *not* 
>>inserted into the syntax.
> 
> 
> As far as I know if you "delegate", e.g. do not have a protected rule,
> it does not return a Token instead of a String or something - there is
> also no (Java) implementation difference, but that should not matter to
> the user anyways.
> 
See the article on protected rules at 
http://www.jguru.com/faq/view.jsp?EID=125:  "...'You get a Token object 
for every lexical rule in your lexer grammar.' This is indeed the 
default case for ANTLR's lexer grammars... To distinguish these "helper" 
rules from rules that result in tokens, use the protected modifier."

Mr. Parr chose "protected" by analogy: it controls access visibility in 
Java, and in Antlr "helper" rules (protected ones) are not "seen" by the 
parser - they don't get "hoisted" into the nextToken logic.  That may be 
true, but I argue that it's not what really counts.  It's the token 
binding that matters.

Unfortunately, in the same article is the statements "I now recognize 
this approach is a mistake.  I have a number of proposals to fix 
this..."  So maybe version 3 does something completely different; I 
haven't had time to look at v3.

And later:  "By definition, all lexical rules return Token objects 
(ANTLR optimizes away many of these object creations, however), but only 
the Token objects of non-protected rules get pulled out of the lexer 
itself."  I.e. lexer and parser stuff is treated differently?  I'm not 
sure of the exact implications of this, but it seems to me that 
"splicing" captures the same idea (or addresses the same issue).

(I saw your note below about using "tokenizing" in this sense.  I agree, 
it isn't the right term for my meaning.  Until I can think of better 
metalanguage, I'll surrender and say "non-splicing rule" instead of 
"tokenizing rule".  ;)

> I'm not sure what you're referring to with local scope, but if mean that
> a "spliced" rule should be able to access stuff from the scope of the
> "calling" rule, then this is -sorry- pure madness. Macros are evil! A

Right, I agree with that.  The question being "Can an action attached to 
a spliced rule make reference to anything outside of the spliced rule 
itself?", right?  Well now, that's kind of interesting, since ordinary 
macros in other languages don't consist of a pair of (rule, action).  So 
what is the naturally intuitive way to interpret a spliced rule with 
actions attached?  To me it is: splice the rule's grammar production 
syntactically but still restrict the action to the local scope (the one 
in which the action is defined).  In other words, scope things exactly 
the same way as with non-spliced rules, but then "unpack" the list of 
matches and insert them into the splicing ("calling") rule.

>>
>>Internal/external, hidden/exposed, etc. - that's all distinct from the 
>>core issue of splicing v. tokenizing, no?
> 
> Well, I'm arguing that there is no splicing issue, just an
> internal/external issue.
..
> 
> 
>>  Which might be construed more 
>>usefully as syntactic v. semantic splicing.  I suppose one might argue 
>>that the internal/external distinction is itself an irrelevant 
>>implementation detail - what counts is tokenizing.
> 
> 
> Internal vs. External has a huge impact, e.g.
> FOO: "abc" BAR;
> BAR: "def";
> BAZ: "def";
> does not work, as BAR and BAZ are identical.
  Add "protected" or
> "internal" or whatever to BAR, and it will work. The example is a bit
> contrived, but you get the idea.

Ok, agreed.  So as I see it, there are two (orthogonal?) issues: 1) how 
to specify that a rule is not to be included in the top-level nextToken 
logic (ie. is protected/inner/hidden/etc.); and 2) how to indicate that 
a reference to a rule should be either replaced by the rule grammar text 
(spliced), or bound to a structure of tokens (terminology to be determined.)

For (1), I agree we need a decoration of some kind on the name 
definition.  I rather like "hidden", but there might be something 
better.  Note that a "hidden" rule need not be used for splicing. For 
(2), the splice operator allows the client rule to control the 
tokenizing of the referenced rule, i.e. whether it's results are bound 
to the reference name.

In any case, thanks very much for the feedback - you've given me lots to 
think about.  I expect to write up something more formal about all this 
stuff before too long.   There are other aspects of subtemplate 
referencing that I haven't addressed, most obviously passing parameters. 
  That I construe as a renaming operation - function calling without the 
function call; and once you have the logic and syntax for renaming, you 
get very powerful template reuse capabilities.

BTW, don't forget that part of the motivation is to unify the language 
across Antlr grammars and StringTemplate, and part is to remove 
programmer-speak from the language definition.  The notion of splicing 
works quite well in ST, and eliminates the need for method call syntax:

"foo $bar()$ baz" becomes "foo ,bar baz".

And since parse grammars and output templates are fundamentally the same 
animal - grammars - we should be able to use the same operations and 
symbols in each.

cheers,

gregg