[antlr-interest] [v3] not including text in token. Still possible?
Kay Roepke
kroepke at dolphin-services.de
Sun Feb 5 19:54:30 PST 2006
Hi!
Am I mistaken, have I missed anything, or am I plain stupid? (Quite a
possibility...;))
In v2 one could suffix a literal or tokenref with a '!' and keep that
(token's) text from being included in the token, like this:
BAREWORD
: '<'! ID (PACKAGEDELIM ID)* '>'!
{token = null;}
;
protected PACKAGEDELIM
: '::'
;
protected ID : ('a'..'z' | 'A'..'Z' | '_')+
;
Running this with tonight's version of v3 from the depot I get:
classDump:~/Projects/examples-v3/java/perl kroepke$ java Main
Test_input.txt
tokens=package <openBC::Debug>;
seen packageStmt
seen declaration
tree=(PACKAGE <openBC::Debug>)
The angle brackets are still being included in the token.
This example is a bit contrived - what I was actually looking for was
to force the PACKAGEDELIM and ID rules not to
generate a token by themselves. I want all the text to end up in a
single BAREWORD token. For this I currently have to
nullify the token after matching 'ID (PACKAGEDELIM ID)*'. Is there an
easier way to do this? I'd rather not generate
all those tokens just to discard them later on.
If I don't set 'token = null' I end up with an ID token which causes
big trouble in the parser later on. This is obviously
not what I want. I ask the lexer for a BAREWORD token and get back an
ID token...
Also, is there a way to get back the behavior of EA7 when it comes to
printing the tokens of a CommonTokenStream? It used
to show a lot of extra information about the tokens. A first glance
at CommonTokenStream.java didn't reveal the secret to me :(
Thanks,
Kay
More information about the antlr-interest
mailing list