[antlr-interest] String lexing and partial tokens

Terence Parr parrt at cs.usfca.edu
Tue Nov 28 10:15:42 PST 2006


Hi Gang,

We will reinvestigate '!' after 3.0 is done and the book is out.  I  
am not opposed to this, I just did not have time to find a way to  
optimize things for the moment.

Ter


On Nov 28, 2006, at 8:02 AM, Jim Idle wrote:

> Gavin,
>
> Fair comment I think, though I personally prefer to see the whole  
> parameter set. I can create another macro that makes something like  
> that a bit easier. I will add it to my list of stuff to do. The  
> reintroduction of ! is another matter, which we have discussed  
> quite a bit and Ter is loathe to reintroduce at this point in the  
> game ;-)
>
> Jim
>
>
>
> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- 
> bounces at antlr.org] On Behalf Of Gavin Lambert
> Sent: Monday, November 27, 2006 11:12 PM
> To: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] String lexing and partial tokens
>
> At 16:14 27/11/2006, Jim Idle wrote:
>> The lexer emits a token automatically if you have not emitted
> one,
>> but if you use (C output) emitNew() in an action then it will
> use
>> this as the token. So, to exclude the start and end character:
>>
>> STRING: '"' (~'"')* '"'
>> 	{
>>
> emitNew(type,line,charPosition,channel,start,getCharIndex()-1);
>> 	}
>
> The thing is that this is a lot more parameters than I really want
> to deal with in a grammar.  It violates my "this should be simple"
> rule :)
>
> Though I agree that having it not go allocating strings is a good
> thing, so avoiding $setText seems like a good idea.
>
> How about something more like what I ended up hacking out, with a
> bit of extra support code to make it more palatable?  Like so:
>
> STRING: '"' content=UnquotedText '"' { emitPartial($content); };
> fragment UnquotedText: (~'"')*;
>
> Where 'emitPartial(x);' is the equivalent of 'emit(x);
> ltoken()->setType(ltoken(), the_token_type_being_generated);'
>
> That should be fairly simple to implement.
>
> It'd be better still if the fragment weren't required, and you
> could write something like this (this generates an AST parse error
> from ANTLR at the moment):
>
> STRING: '"' content=(~'"')* '"' { emitPartial($content); };
>
> (maybe you'd have to have an extra set of parentheses around
> there; not sure.)
>
> And the ultimate extension would then be to reintroduce the !
> operator, which automatically did the above stuff if all the non-!
> components of the rule formed a contiguous block.  If they're
> non-contiguous, then it'd still be an error since you can't
> generate a single substring from the incoming char stream that way.
>
>
> -- 
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.1.409 / Virus Database: 268.14.19/555 - Release Date:  
> 11/27/2006
>
>
> -- 
> No virus found in this outgoing message.
> Checked by AVG Free Edition.
> Version: 7.1.409 / Virus Database: 268.14.19/555 - Release Date:  
> 11/27/2006
>



More information about the antlr-interest mailing list