[antlr-interest] 4.0 daily builds

Tue Jan 3 17:48:59 PST 2012

Hi Gavin,

Sorry for the delay. I'm looking at the Lexer.nextToken method and it looks like it tracks the starting character position in the input buffer and then, at a valid match, it create a token from the start and stop character. If you set the this.text field at any time during a token match, the emit() method will use that text override instead of the start and stop character positions.

Upon skip, everything is thrown away and start again.

Upon more, it continues looking for a token without resetting the starting character position or field text. So, if you want to modify the text for the token such as \n (the 2 characters) going to the actual
neline character  (1 character), you will need to modify field text. But, remember that it contains anything matched before hand that called more. So, if you are matching escaped characters in a string, for example, and you want to replace \n with newline, that you should only change the last 2 characters of text. Do not reset text. Do:

text = text.substring(0,text.length()-2) + "\n"; 

or something like that. That is inefficient, so you can always manage  your own character buffer.

 I am open to suggestions about how to make your life easier in this case!

Ter

On Jan 2, 2012, at 1:32 PM, Gavin Lambert wrote:

> At 11:44 2/01/2012, Terence Parr wrote:
>>> Can the special rules modify the text they're matching
>>> in terms of the text the eventual token gets?
>> 
>> sure by setting this.text.
> 
> Does that work when more() is used to tell it to return a single 
> token?  As I recall, while lexer fragment rules could set $text 
> all they wanted, it didn't actually have any effect, since the 
> final rule simply set its text based on the matched bounds, not 
> the text of its subrules.
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address