[antlr-interest] Manipulating text in the lexer

Thu Feb 26 15:06:06 PST 2009

Gavin Lambert wrote:
> At 04:48 27/02/2009, Sam Barnett-Cormack wrote:
>  >http://www.antlr.org/blog/antlr3/lexical.tml suggests that it's
>  >no longer possible to alter the content of a token away from
>  >what's on the input at all.
> 
> That isn't true.  You can use setText or $text = "..." to change the 
> text of a token just fine.
> 
> What's probably biting you, though, is that this only works from the 
> top-level lexer rule.  You can't change the text of a fragment (or 
> rather, you can, but it won't end up being used in the final token).
> 
>  >fragment
>  >CSTRINGNL : WSNONL* NL WSNONL* {setText("");};
> 
> So this setText will have absolutely no effect.
> 
>  >CSTRING : '"' ((CSTRINGNL)=> CSTRINGNL | '""' | ~'"') '"';
> 
> But if you called it here, then it would work.

Of course, it would be hard to structure, if I have to set the text of 
the whole damn token at once. If we had the old ! operator, it'd be easy 
enough... make the '"' at the beginning and end be missed out, make the 
first of the three choices be ignored, and make the second be !'"' '"'. 
That'd be just grand. However, I can't see an easy way to make the 
resulting token text be what I want it to be with the current system. If 
anyone knows how I can, please let me know.

-- 
Sam Barnett-Cormack