[antlr-interest] [BUG] 3.0b4 no complaint on parser reference to lexical fragment
Kay Roepke
kroepke at classdump.org
Sun Nov 12 19:38:33 PST 2006
On 13. Nov 2006, at 4:02 , John B. Brodie wrote:
>
> and is this a feature or a bug?
this is a feature.
> i am trying to assert that this is a bug.
i realize ;)
> from the fact that "fragment X <...snip...> rule X cannot and does
> not return a
> token by itself."
differentiate between 'rule X' and...
> we must conclude that "<...snip...>, thus passing X tokens to the
> parser."
> shall *not* be permitted.
...token type X.
> i understand that the current antlr v3 implementation (3.b04) does not
> consider references to lexical fragments by the parser as an
> error. i am
> just trying to assert that this current implementation is problematic.
This has not been changed in the upcoming b5.
The key point to see here is that the parser does not "call" a lexer
rule!
It merely reads from a token stream that is calling nextToken() in
the lexer.
In which way the lexer ends up with a token to return is unspecified
and this is a GoodThing(tm).
It means that you could use a lexer with a different internal
structure (say different
rules) or even a non-ANTLR generated lexer. You could write a simple
wrapper around flex
or hand-code a lexer, if you have special needs, such as performance.
So, even if it might look as the parser is calling rule X in the
lexer class, it's not!
The parser isn't concerned with the lexer rules at all, it's just
interested in the type
of a particular token (which is also called X). Maybe this
overlapping of terminology
is the source of the confusion.
A rule X implies the token it returns to have the type X, but that is
not enforced at all.
In the general case it's the exception to return a token with a
different type, but sometimes
it's the easiest way out (like in lexing number literals).
I think it would be unnatural to forbid the use of token types
induced by fragment rules,
there's no need to do that either.
When I stretch my mind a bit, I can even imagine that I'd actually
want to emit tokens for
fragment rules. Although I realize that I might totally confuse the
issue at hand right now, I
cannot refrain from writing this down ;)
Ok, what am I thinking?
Conside the following rules in the lexer:
FOO
: start=ID c=C end=ID
{ emit(FOO); emit(start); emit(c); emit(end); } // this is
pseudocode, but i think you get what i mean
;
fragment C : '0x01223'; // some magic thing that should not be normal
token.
fragment ID : 'a'..'z' ('a'..'z'|'0'..'9')*;
Suppose you have built a lexer subclass that can emit multiple tokens
for one lexer rule (ANTLR
by default emits a maximum of one token per lexer rule).
In the parser you'd like to receive multiple tokens when you
reference FOO. You could write:
somerule : FOO ID C ID ; // FOO generates ID C ID even though the
rules are fragments!
what have you won? you might get around fiddling with the token's
text's in the parser, you could
possibly set up a finer control of token channels, etc.
This might be a bad idea, but is interesting nonetheless. ;)
I need coffee. Quick.
cheers,
-k
--
Kay Röpke
http://classdump.org/
More information about the antlr-interest
mailing list