[antlr-interest] (follow up) setting, altering text in lexer rules

Mon Jun 12 14:05:34 PDT 2006

On Jun 12, 2006, at 1:59 PM, Martin Probst wrote:

> Hi,
>
>> Ok, I propose that we take a big step back and say "you can set  
>> the text for the token manually".  You get a setText() method and  
>> the auto mechanism will see your altered text if nonnull.  If you  
>> want to build up a token piecemeal you must do so manually.  So  
>> you'd do this:
>>
>> ESC : '\\' 'n' {setText("\n");} ;
>>
>> I still need to spend time inc/dec the rule level though so I know  
>> when to emit a token.  It seems to cost a wee bit but that is ok I  
>> guess.
>
> are you 100% sure about this? I think the "!" operator is one of  
> the most important feature of ANTLR's Lexers.

Do you mean specifically ! or the ability to set/build-up the text?

> And there are cases where it's not that easy to figure out the text  
> - the user would have to re-parse the text in $getText() to get to  
> his result. That's almost certainly more expensive. Is there  
> absolutely no way of supporting this in a "if you use it you pay" way?

Yeah, I tried...there is too much work to do that must be there if  
any ! is used in the grammar.

> Did you try StringBuilder instead? If you call .setLength(0) once  
> per token it really shouldn't matter that much except for the  
> synchronization on StringBuffer.

I have to be at least 1.4 compatible...is that 1.4?  That cost me  
200ms out of 2000 or so, which is a lot.

> What about the optimization of truncating start and end characters  
> simply by using different offsets? I think this is the most common  
> use case, e.g.:
>
> STRING: '\"'! CHARS '\"'!;

Yep, and a simple action with a substring will work; like 1 or 2  
places in the grammar you want; should be ok to do manually.

setText(getText().substring(0, getText().length()))

or whatever.  Or, just buffer up manually in CHARS.

Ter