[antlr-interest] [BUG] 3.0b4 no complaint on parser reference to lexical fragment

Mon Nov 13 17:43:19 PST 2006

Hi,

> >> There is an interface between a Parser and a Lexer. The Lexer
> >> produces a
> >> stream of Tokens which the Parser consumes.
> >
> >Exactly. The question now is, what is that interface? Is it 
> the set of 
> >lexer rules? Or is it the set of token types?
> 
> Apparently the set of rules is the same as the set of token types.

No. Token types may be defined in a tokens{..} block without an associated
rule.

> >> And of what type should these lexer produced Tokens be?
> >
> >The set is defined by the terminal symbols of the language.
> 
> Yes. and as we have both pointed out to each other, lexical 
> fragments do not represent terminal symbols of the language.

Not quite. They just do not [normally] emit tokens. I'd have to double-check
again if that can be overridded with action code.

> >To actually prevent a grammar author to use that token type is much 
> >more involved. It means you either have to change the way fragment 
> >rules are represented internally, or you have to check all 
> actions to 
> >catch any attempt to change a token's type to a forbidden 
> value. That 
> >sounds too difficult and I'd call that problematic. It'd be
> >bound to be a fragile implementation.
> 
> I envisioned that the code that handles token references in 
> parser rules would do the check. not any code in lexer rules 
> that sets the token type.

Parsers [quite rightly] know nothing about lexer rules or fragments. They
just expect a stream of tokens (with token types from their token type
vocabulary).

> The file produce by the lexer generation code containing the 
> assigned token types (is it the *.tokens file?) would need to 
> include an additional flag for each token type to indicate 
> whether or that token type was induced by a lexical fragment 
> (or maybe just not write fragment token types to that file in 
> the first place?) the parser generation code would then use 
> that flag to perform the error check.
> 
> I am sure I have oversimplified this checking. Not sure how 
> the handling of a tokens{} section would impact this checking.

Interesting idea. While it certainly could be done, I can't help feeling
that this is really a training issue.

My reasoning?. Well:
- there are legitimate reasons for sending tokens types named after a
fragment rule to the parser as Kay pointed out.
- the option exists to name fragment rules (and their auto-generated token
type namesake) such that it is impossible to misuse unintentionally [e.g.
DIGIT_NotForParser, DoNotUseInParser_DIGIT, LexerInternal_DIGIT]

> >I have a hard time to believe that this is a real-world scenario.
> 
> I have helped new users to resolve this on at least 2 
> occasions. Most recently just this past Sunday immediately 
> before I started this thread.

As I said, this sounds like a training issue.

Micheal

-----------------------
The best way to contact me is via the list/forum. My time is very limited.