[antlr-interest] (follow up) setting, altering text in lexer rules

Mon Jun 12 13:59:49 PDT 2006

Hi,

> Ok, I propose that we take a big step back and say "you can set the  
> text for the token manually".  You get a setText() method and the  
> auto mechanism will see your altered text if nonnull.  If you want  
> to build up a token piecemeal you must do so manually.  So you'd do  
> this:
>
> ESC : '\\' 'n' {setText("\n");} ;
>
> I still need to spend time inc/dec the rule level though so I know  
> when to emit a token.  It seems to cost a wee bit but that is ok I  
> guess.

are you 100% sure about this? I think the "!" operator is one of the  
most important feature of ANTLR's Lexers. And there are cases where  
it's not that easy to figure out the text - the user would have to re- 
parse the text in $getText() to get to his result. That's almost  
certainly more expensive. Is there absolutely no way of supporting  
this in a "if you use it you pay" way?

Did you try StringBuilder instead? If you call .setLength(0) once per  
token it really shouldn't matter that much except for the  
synchronization on StringBuffer.

What about the optimization of truncating start and end characters  
simply by using different offsets? I think this is the most common  
use case, e.g.:

STRING: '\"'! CHARS '\"'!;

Martin