[antlr-interest] [v3] not including text in token. Still possible?

Kay Roepke kroepke at dolphin-services.de
Sun Feb 5 19:54:30 PST 2006


Hi!

Am I mistaken, have I missed anything, or am I plain stupid? (Quite a  
possibility...;))

In v2 one could suffix a literal or tokenref with a '!' and keep that  
(token's) text from being included in the token, like this:

BAREWORD
	:	'<'! ID (PACKAGEDELIM ID)* '>'!
	{token = null;}
	;

protected PACKAGEDELIM
	:	'::'
	;


protected ID	:	('a'..'z' | 'A'..'Z' | '_')+
	;


Running this with tonight's version of v3 from the depot I get:
classDump:~/Projects/examples-v3/java/perl kroepke$ java Main  
Test_input.txt
tokens=package <openBC::Debug>;


seen packageStmt
seen declaration
tree=(PACKAGE <openBC::Debug>)

The angle brackets are still being included in the token.
This example is a bit contrived - what I was actually looking for was  
to force the PACKAGEDELIM and ID rules not to
generate a token by themselves. I want all the text to end up in a  
single BAREWORD token. For this I currently have to
nullify the token after matching 'ID (PACKAGEDELIM ID)*'. Is there an  
easier way to do this? I'd rather not generate
all those tokens just to discard them later on.
If I don't set 'token = null' I end up with an ID token which causes  
big trouble in the parser later on. This is obviously
not what I want. I ask the lexer for a BAREWORD token and get back an  
ID token...

Also, is there a way to get back the behavior of EA7 when it comes to  
printing the tokens of a CommonTokenStream? It used
to show a lot of extra information about the tokens. A first glance  
at CommonTokenStream.java didn't reveal the secret to me :(

Thanks,

Kay


More information about the antlr-interest mailing list