[antlr-interest] [BUG] 3.0b4 no complaint on parser reference to lexical fragment
Micheal J
open.zone at virgin.net
Mon Nov 13 17:43:19 PST 2006
Hi,
> >> There is an interface between a Parser and a Lexer. The Lexer
> >> produces a
> >> stream of Tokens which the Parser consumes.
> >
> >Exactly. The question now is, what is that interface? Is it
> the set of
> >lexer rules? Or is it the set of token types?
>
> Apparently the set of rules is the same as the set of token types.
No. Token types may be defined in a tokens{..} block without an associated
rule.
> >> And of what type should these lexer produced Tokens be?
> >
> >The set is defined by the terminal symbols of the language.
>
> Yes. and as we have both pointed out to each other, lexical
> fragments do not represent terminal symbols of the language.
Not quite. They just do not [normally] emit tokens. I'd have to double-check
again if that can be overridded with action code.
> >To actually prevent a grammar author to use that token type is much
> >more involved. It means you either have to change the way fragment
> >rules are represented internally, or you have to check all
> actions to
> >catch any attempt to change a token's type to a forbidden
> value. That
> >sounds too difficult and I'd call that problematic. It'd be
> >bound to be a fragile implementation.
>
> I envisioned that the code that handles token references in
> parser rules would do the check. not any code in lexer rules
> that sets the token type.
Parsers [quite rightly] know nothing about lexer rules or fragments. They
just expect a stream of tokens (with token types from their token type
vocabulary).
> The file produce by the lexer generation code containing the
> assigned token types (is it the *.tokens file?) would need to
> include an additional flag for each token type to indicate
> whether or that token type was induced by a lexical fragment
> (or maybe just not write fragment token types to that file in
> the first place?) the parser generation code would then use
> that flag to perform the error check.
>
> I am sure I have oversimplified this checking. Not sure how
> the handling of a tokens{} section would impact this checking.
Interesting idea. While it certainly could be done, I can't help feeling
that this is really a training issue.
My reasoning?. Well:
- there are legitimate reasons for sending tokens types named after a
fragment rule to the parser as Kay pointed out.
- the option exists to name fragment rules (and their auto-generated token
type namesake) such that it is impossible to misuse unintentionally [e.g.
DIGIT_NotForParser, DoNotUseInParser_DIGIT, LexerInternal_DIGIT]
> >I have a hard time to believe that this is a real-world scenario.
>
> I have helped new users to resolve this on at least 2
> occasions. Most recently just this past Sunday immediately
> before I started this thread.
As I said, this sounds like a training issue.
Micheal
-----------------------
The best way to contact me is via the list/forum. My time is very limited.
More information about the antlr-interest
mailing list