[antlr-interest] String lexing and partial tokens

Terence Parr parrt at cs.usfca.edu
Sat Nov 25 09:58:10 PST 2006


On Nov 25, 2006, at 3:56 AM, Gavin Lambert wrote:

> What's the new 3.0 way to do string lexing?  I'd like to have it  
> strip off the surrounding quotes so that the token contains just  
> the text itself.  My first attempt was this, since it's the v2 way:
>
> STRING: '"'! ( ~'"' )* '"'!	;
>
> But that gives me this error:
>
> error(149): Message.g3:101:7: rule STRING uses rewrite syntax or  
> operator with no output option or lexer rule uses !
>
> Looking in the archives seems to indicate that ! is no longer  
> supported, which is a pain in the butt.  It was a nice simple  
> syntax, and the alternatives all seem a lot more complicated.   
> Incidentally, what *is* the recommended alternative?  Further posts  
> seemed to suggest that calling $setText or setText would do the  
> trick, but those functions don't seem to exist in the C runtime  
> (which is what I'm trying to use); or at least I can't find them.

You can  ask Jim Idle about that, but we decided to use methods for  
setting the text rather than implementing ! which makes everything  
inefficient. I could swear there was something in the documentation.

> On an only-slightly-related note, I was also wondering what's the  
> right way to deal with lexical ambiguity?  Say I've got one parsing  
> context (eg. after a #include in C) where backslashes are treated  
> literally, not as escapes, and another context (anywhere else)  
> where they should be used as an escape sequence.  And again,  
> ideally I want the resulting token to contain the 'real' string  
> (ie. after escapes had been acted on).  Is this even possible?  (I  
> imagine you could do it by treating it as an island grammar.  But  
> that seems a little heavyweight.)

Easy enough, just match \  with a rule called FILENAME after '#include'.
Ter


More information about the antlr-interest mailing list