[antlr-interest] [BUG] 3.0b4 no complaint on parser reference to lexical fragment

Sun Nov 12 21:21:14 PST 2006

On 13. Nov 2006, at 5:31 , John B. Brodie wrote:

>
>> The key point to see here is that the parser does not "call" a  
>> lexer rule!
>
> Of course.      ...where, in this thread, was it stated otherwise.

Nowhere in particular. It just wanted to make the point that rules  
and token types
are not required to be isomorph. No offense meant.

>> When I stretch my mind a bit, I can even imagine that I'd actually  
>> want to
>> emit tokens for fragment rules.
>
> Of course. fragment rules are inside the lexer and so may call emit 
> () as an
> action, just like any other lexer rule. And of course if any lexer  
> rule wants
> to emit() multiple tokens it can do so, but the default lexer only  
> buffers 1
> token at a time so must be modified. See the lexer example i posted  
> to this
> list (after Dr. Parr helped me) that emits multiple tokens when  
> trying to
> differentiate between integers, reals and the range ".."  operator.

Yes, I recall that.

>> what have you won? you might get around fiddling with the token's  
>> text's in
>> the parser, you could possibly set up a finer control of token  
>> channels, etc.
>> This might be a bad idea, but is interesting nonetheless. ;)
>
> who is talking about any of that in this thread?

As a matter of fact, I gave an example where I could imagine to do  
something of
that sort we have been talking about.

>
> There is an interface between a Parser and a Lexer. The Lexer  
> produces a
> stream of Tokens which the Parser consumes.

Exactly. The question now is, what is that interface? Is it the set
of lexer rules? Or is it the set of token types?

> And of what type should these lexer produced Tokens be?

The set is defined by the terminal symbols of the language.

> Should that type include the tokens{}, lexer rules *AND* fragment  
> rules?

IMHO, yes. It is only natural that even fragment rules generate token  
types, too,
because once you label them in an enclosing rule, ANTLR has to  
generate a token for them.
fragment only means that this rule is not invoked directly when  
nextToken is invoked.
There are a variety of reasons why a rule might be declared as  
fragment, see below.

> Or should that type be restricted *AND ENFORCED* to just tokens{} and
> lexer rules --- with fragments excluded.

One could argue that way. I see your point, though I don't share it.
I don't see a problem with the current implementation. The key point  
I was referring
to earlier, namely that the parser doesn't call lexer rules, means  
that I cannot
arbitrarily call fragment rules from the parser. Thus the lexer  
implementation
is still hidden from the parser and the only interface is the set of  
token types.
I argue that there is no purpose in excluding the token types induced  
by fragment
rules, since it doesn't do harm to reference those in the parser  
(other than preventing
it to match anything, but there are numerous other ways to achieve  
that...).
To actually prevent a grammar author to use that token type is much  
more involved. It means you
either have to change the way fragment rules are represented  
internally, or you have to check
all actions to catch any attempt to change a token's type to a  
forbidden value.
That sounds too difficult and I'd call that problematic. It'd be  
bound to be a fragile implementation.

Furthermore, I think there a bona-fide reasons to make lexer rules  
fragmented rules, other than them being
simple helper rules. One is to reduce lookahead in the generated  
tokens rule. Maybe there's a need to
disambiguate between some rules, too. I'm sure there are situations  
where just setting a fragment rule's
type via a 'switch' lexer rule is easier than to restructure the  
grammar in an unnatural way.

What exactly is your gripe with this? Are you concerned that one  
might reference a token type
that is associated with a fragment rule, thus preventing the parser  
rule to match?
I have a hard time to believe that this is a real-world scenario.

kind regards,

-k
-- 
Kay Röpke
http://classdump.org/