[antlr-interest] HowTo manipulate returned Token value?

Fri May 24 14:08:31 PDT 2002

Hi All,

I suspect it should be possible to be able to manipulate the string 
value associated with a Token before it is returned from a lexer and 
perhaps insert additional tokens too.  ;-)

I am trying to deal with the C preprocessor and I wanted my 
CLangPreprocessorLexer to be able to return tokens for preprocessor 
directives. 

Given the following definition,

PRE_DEFINE
   : (PRE_WS)* '#' (PRE_WS)* "define" (PRE_WS)+ PRE_IDENT (PRE_WS)+ 
(PRE_DEFINE_PARAMS)? (PRE_WS)+ PRE_DEFINE_TOKENSTRING NEWLINE
   ;

ANTLR returns the whole line - including the NEWLINE char - as the 
value associated with token PRE_DEFINE. Can I manipulate the textual 
value associated with the tokens in the Lexer before they are 
returned?

Perhaps so I can return:
   PRE_DEFINE<"">                            then
   PRE_DEFINE_IDENT<ident-val>               then
   PRE_DEFINE_PARAMS<param-string>           then
   PRE_DEFINE_TOKENSTRING<token-string>      then
   PRE_NEWLINE

ADDITIONALLY...

I am working on two Parsers that would share the Lexer -- one that 
cares about preprocessor stuff and one that doesn't. I can't just 
ignore all PRE_xxxx tags in the second Parser as it might result in 
the Parser seeing code that the PRE_xxxx tokens would have flagged as 
conditionally excluded.

Can multiple Lexers be arranged as streams of "filters"?. I might be 
able to code a CLangPreprocessorStripperLexer that feeds on the first?

Or do I have no choice but to develop two versions of the Lexers or, 
have both my Parsers be aware of PRE_xxxx tokens?

Micheal

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/