[antlr-interest] String lexing and partial tokens
Robert Hill
rob.hill at blueyonder.co.uk
Sat Nov 25 14:51:40 PST 2006
I think you might need your keyword before the filename, to differentiate it
from the STRING rule.
FILENAME: 'include' '"' content=UnquotedText '"' { emit($content);
/2ob
> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Gavin Lambert
> Sent: 25 November 2006 22:10
> To: Terence Parr
> Cc: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] String lexing and partial tokens
>
> At 06:58 26/11/2006, Terence Parr wrote:
> >
> >> On an only-slightly-related note, I was also wondering what's
> >> the right way to deal with lexical ambiguity? Say I've got
> one
> >> parsing context (eg. after a #include in C) where backslashes
> >> are treated literally, not as escapes, and another context
> >> (anywhere else) where they should be used as an escape
> sequence.
> >> And again, ideally I want the resulting token to contain the
> >> 'real' string (ie. after escapes had been acted on). Is this
> >> even possible? (I imagine you could do it by treating it as
> an
> >> island grammar. But that seems a little heavyweight.)
> >
> >Easy enough, just match \ with a rule called FILENAME after
> >'#include'.
>
> So, this would mean that the lexer and grammar are run in
> parallel, so that the grammar can influence the lexer? For some
> reason, I always thought that the character stream was completely
> lexed, and then the resulting tokens were parsed.
>
> Anyway, I tried that and it gave me a warning:
>
> warning(208): Message.g3:99:1: The following token definitions are
> unreachable: STRING
>
> The relevant definitions are:
>
> FILENAME: '"' content=UnquotedText '"' { emit($content);
> ltoken()->type = FILENAME; };
>
> fragment UnquotedText: (~'"')* ;
>
> STRING: '"' content=EscapedText '"' { emit($content);
> ltoken()->type = STRING; };
>
> fragment EscapedText: (EscapeSequence | ~('\\' | '"'))* ;
>
>
> And yes, both FILENAME and STRING are referenced by the grammar.
More information about the antlr-interest
mailing list