[antlr-interest] String lexing and partial tokens

Jim Idle jimi at intersystems.com
Tue Nov 28 08:02:21 PST 2006


Gavin,

Fair comment I think, though I personally prefer to see the whole parameter set. I can create another macro that makes something like that a bit easier. I will add it to my list of stuff to do. The reintroduction of ! is another matter, which we have discussed quite a bit and Ter is loathe to reintroduce at this point in the game ;-)

Jim



-----Original Message-----
From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Gavin Lambert
Sent: Monday, November 27, 2006 11:12 PM
To: antlr-interest at antlr.org
Subject: Re: [antlr-interest] String lexing and partial tokens

At 16:14 27/11/2006, Jim Idle wrote:
 >The lexer emits a token automatically if you have not emitted 
one,
 >but if you use (C output) emitNew() in an action then it will 
use
 >this as the token. So, to exclude the start and end character:
 >
 >STRING: '"' (~'"')* '"'
 >	{
 > 
emitNew(type,line,charPosition,channel,start,getCharIndex()-1);
 >	}

The thing is that this is a lot more parameters than I really want 
to deal with in a grammar.  It violates my "this should be simple" 
rule :)

Though I agree that having it not go allocating strings is a good 
thing, so avoiding $setText seems like a good idea.

How about something more like what I ended up hacking out, with a 
bit of extra support code to make it more palatable?  Like so:

STRING: '"' content=UnquotedText '"' { emitPartial($content); };
fragment UnquotedText: (~'"')*;

Where 'emitPartial(x);' is the equivalent of 'emit(x); 
ltoken()->setType(ltoken(), the_token_type_being_generated);'

That should be fairly simple to implement.

It'd be better still if the fragment weren't required, and you 
could write something like this (this generates an AST parse error 
from ANTLR at the moment):

STRING: '"' content=(~'"')* '"' { emitPartial($content); };

(maybe you'd have to have an extra set of parentheses around 
there; not sure.)

And the ultimate extension would then be to reintroduce the ! 
operator, which automatically did the above stuff if all the non-! 
components of the rule formed a contiguous block.  If they're 
non-contiguous, then it'd still be an error since you can't 
generate a single substring from the incoming char stream that way.


-- 
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.409 / Virus Database: 268.14.19/555 - Release Date: 11/27/2006
 

-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.409 / Virus Database: 268.14.19/555 - Release Date: 11/27/2006
 


More information about the antlr-interest mailing list