[antlr-interest] String lexing and partial tokens
Terence Parr
parrt at cs.usfca.edu
Sat Nov 25 09:58:10 PST 2006
On Nov 25, 2006, at 3:56 AM, Gavin Lambert wrote:
> What's the new 3.0 way to do string lexing? I'd like to have it
> strip off the surrounding quotes so that the token contains just
> the text itself. My first attempt was this, since it's the v2 way:
>
> STRING: '"'! ( ~'"' )* '"'! ;
>
> But that gives me this error:
>
> error(149): Message.g3:101:7: rule STRING uses rewrite syntax or
> operator with no output option or lexer rule uses !
>
> Looking in the archives seems to indicate that ! is no longer
> supported, which is a pain in the butt. It was a nice simple
> syntax, and the alternatives all seem a lot more complicated.
> Incidentally, what *is* the recommended alternative? Further posts
> seemed to suggest that calling $setText or setText would do the
> trick, but those functions don't seem to exist in the C runtime
> (which is what I'm trying to use); or at least I can't find them.
You can ask Jim Idle about that, but we decided to use methods for
setting the text rather than implementing ! which makes everything
inefficient. I could swear there was something in the documentation.
> On an only-slightly-related note, I was also wondering what's the
> right way to deal with lexical ambiguity? Say I've got one parsing
> context (eg. after a #include in C) where backslashes are treated
> literally, not as escapes, and another context (anywhere else)
> where they should be used as an escape sequence. And again,
> ideally I want the resulting token to contain the 'real' string
> (ie. after escapes had been acted on). Is this even possible? (I
> imagine you could do it by treating it as an island grammar. But
> that seems a little heavyweight.)
Easy enough, just match \ with a rule called FILENAME after '#include'.
Ter
More information about the antlr-interest
mailing list