[antlr-interest] String lexing and partial tokens
Terence Parr
parrt at cs.usfca.edu
Tue Nov 28 10:15:42 PST 2006
Hi Gang,
We will reinvestigate '!' after 3.0 is done and the book is out. I
am not opposed to this, I just did not have time to find a way to
optimize things for the moment.
Ter
On Nov 28, 2006, at 8:02 AM, Jim Idle wrote:
> Gavin,
>
> Fair comment I think, though I personally prefer to see the whole
> parameter set. I can create another macro that makes something like
> that a bit easier. I will add it to my list of stuff to do. The
> reintroduction of ! is another matter, which we have discussed
> quite a bit and Ter is loathe to reintroduce at this point in the
> game ;-)
>
> Jim
>
>
>
> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Gavin Lambert
> Sent: Monday, November 27, 2006 11:12 PM
> To: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] String lexing and partial tokens
>
> At 16:14 27/11/2006, Jim Idle wrote:
>> The lexer emits a token automatically if you have not emitted
> one,
>> but if you use (C output) emitNew() in an action then it will
> use
>> this as the token. So, to exclude the start and end character:
>>
>> STRING: '"' (~'"')* '"'
>> {
>>
> emitNew(type,line,charPosition,channel,start,getCharIndex()-1);
>> }
>
> The thing is that this is a lot more parameters than I really want
> to deal with in a grammar. It violates my "this should be simple"
> rule :)
>
> Though I agree that having it not go allocating strings is a good
> thing, so avoiding $setText seems like a good idea.
>
> How about something more like what I ended up hacking out, with a
> bit of extra support code to make it more palatable? Like so:
>
> STRING: '"' content=UnquotedText '"' { emitPartial($content); };
> fragment UnquotedText: (~'"')*;
>
> Where 'emitPartial(x);' is the equivalent of 'emit(x);
> ltoken()->setType(ltoken(), the_token_type_being_generated);'
>
> That should be fairly simple to implement.
>
> It'd be better still if the fragment weren't required, and you
> could write something like this (this generates an AST parse error
> from ANTLR at the moment):
>
> STRING: '"' content=(~'"')* '"' { emitPartial($content); };
>
> (maybe you'd have to have an extra set of parentheses around
> there; not sure.)
>
> And the ultimate extension would then be to reintroduce the !
> operator, which automatically did the above stuff if all the non-!
> components of the rule formed a contiguous block. If they're
> non-contiguous, then it'd still be an error since you can't
> generate a single substring from the incoming char stream that way.
>
>
> --
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.1.409 / Virus Database: 268.14.19/555 - Release Date:
> 11/27/2006
>
>
> --
> No virus found in this outgoing message.
> Checked by AVG Free Edition.
> Version: 7.1.409 / Virus Database: 268.14.19/555 - Release Date:
> 11/27/2006
>
More information about the antlr-interest
mailing list