[antlr-interest] [BUG] 3.0b4 no complaint on parser reference to lexical fragment

Mon Nov 13 06:53:10 PST 2006

>>
>> There is an interface between a Parser and a Lexer. The Lexer  
>> produces a
>> stream of Tokens which the Parser consumes.
>
>Exactly. The question now is, what is that interface? Is it the set
>of lexer rules? Or is it the set of token types?

Apparently the set of rules is the same as the set of token types.

>> And of what type should these lexer produced Tokens be?
>
>The set is defined by the terminal symbols of the language.

Yes. and as we have both pointed out to each other, lexical fragments
do not represent terminal symbols of the language.

>To actually prevent a grammar author to use that token type is much
>more involved. It means you either have to change the way fragment
>rules are represented internally, or you have to check all actions to
>catch any attempt to change a token's type to a forbidden value.
>That sounds too difficult and I'd call that problematic. It'd be  
>bound to be a fragile implementation.

I envisioned that the code that handles token references in parser
rules would do the check. not any code in lexer rules that sets the
token type.

The file produce by the lexer generation code containing the assigned
token types (is it the *.tokens file?) would need to include an
additional flag for each token type to indicate whether or that token
type was induced by a lexical fragment (or maybe just not write
fragment token types to that file in the first place?) the parser
generation code would then use that flag to perform the error check.

I am sure I have oversimplified this checking. Not sure how the
handling of a tokens{} section would impact this checking.

>Furthermore, I think there a bona-fide reasons to make lexer rules
>fragmented rules, other than them being simple helper rules.

Agreed. I never meant to ask for the removal of lexical framents,
sorry if I was unclear about that.

>What exactly is your gripe with this? Are you concerned that one
>might reference a token type that is associated with a fragment rule,
>thus preventing the parser rule to match?

Yes.

>I have a hard time to believe that this is a real-world scenario.

I have helped new users to resolve this on at least 2 occasions. Most
recently just this past Sunday immediately before I started this thread.