[antlr-interest] (follow up) setting, altering text in lexer rules
Terence Parr
parrt at cs.usfca.edu
Mon Jun 12 14:05:34 PDT 2006
On Jun 12, 2006, at 1:59 PM, Martin Probst wrote:
> Hi,
>
>> Ok, I propose that we take a big step back and say "you can set
>> the text for the token manually". You get a setText() method and
>> the auto mechanism will see your altered text if nonnull. If you
>> want to build up a token piecemeal you must do so manually. So
>> you'd do this:
>>
>> ESC : '\\' 'n' {setText("\n");} ;
>>
>> I still need to spend time inc/dec the rule level though so I know
>> when to emit a token. It seems to cost a wee bit but that is ok I
>> guess.
>
> are you 100% sure about this? I think the "!" operator is one of
> the most important feature of ANTLR's Lexers.
Do you mean specifically ! or the ability to set/build-up the text?
> And there are cases where it's not that easy to figure out the text
> - the user would have to re-parse the text in $getText() to get to
> his result. That's almost certainly more expensive. Is there
> absolutely no way of supporting this in a "if you use it you pay" way?
Yeah, I tried...there is too much work to do that must be there if
any ! is used in the grammar.
> Did you try StringBuilder instead? If you call .setLength(0) once
> per token it really shouldn't matter that much except for the
> synchronization on StringBuffer.
I have to be at least 1.4 compatible...is that 1.4? That cost me
200ms out of 2000 or so, which is a lot.
> What about the optimization of truncating start and end characters
> simply by using different offsets? I think this is the most common
> use case, e.g.:
>
> STRING: '\"'! CHARS '\"'!;
Yep, and a simple action with a substring will work; like 1 or 2
places in the grammar you want; should be ok to do manually.
setText(getText().substring(0, getText().length()))
or whatever. Or, just buffer up manually in CHARS.
Ter
More information about the antlr-interest
mailing list