[antlr-interest] String lexing and partial tokens
Gavin Lambert
antlr at mirality.co.nz
Sat Nov 25 14:10:21 PST 2006
At 06:58 26/11/2006, Terence Parr wrote:
>
>> On an only-slightly-related note, I was also wondering what's
>> the right way to deal with lexical ambiguity? Say I've got
one
>> parsing context (eg. after a #include in C) where backslashes
>> are treated literally, not as escapes, and another context
>> (anywhere else) where they should be used as an escape
sequence.
>> And again, ideally I want the resulting token to contain the
>> 'real' string (ie. after escapes had been acted on). Is this
>> even possible? (I imagine you could do it by treating it as
an
>> island grammar. But that seems a little heavyweight.)
>
>Easy enough, just match \ with a rule called FILENAME after
>'#include'.
So, this would mean that the lexer and grammar are run in
parallel, so that the grammar can influence the lexer? For some
reason, I always thought that the character stream was completely
lexed, and then the resulting tokens were parsed.
Anyway, I tried that and it gave me a warning:
warning(208): Message.g3:99:1: The following token definitions are
unreachable: STRING
The relevant definitions are:
FILENAME: '"' content=UnquotedText '"' { emit($content);
ltoken()->type = FILENAME; };
fragment UnquotedText: (~'"')* ;
STRING: '"' content=EscapedText '"' { emit($content);
ltoken()->type = STRING; };
fragment EscapedText: (EscapeSequence | ~('\\' | '"'))* ;
And yes, both FILENAME and STRING are referenced by the grammar.
More information about the antlr-interest
mailing list