[antlr-interest] More on "early commitment" problems with Lexer
Austin Hastings
Austin_Hastings at Yahoo.com
Wed Dec 5 12:20:40 PST 2007
Howdy,
I'm trying to lex quoted strings that allow backslash-escapes, but that
don't pre-define what the backslash escapes are going to be.
This is in contrast with the existing grammar examples, all of which
have code like:
fragment
EscapeSequence: Slash ('\'' | 't' | 'n' | 'r' | 'b' | '"' | ...) |
UnicodeEscape | OctalEscape | HexEscape ;
STRING_LITERAL: '"' (options {greedy=false;}: EscapeSequence | .)* '"' ;
In these examples, the list of all possible escape sequences is defined
in advance, and it seems to work.
My version looked like this:
Qlit_Single: Q (options {greedy=false;}: Slash Q | . )* Q ;
Where Q = single quote = '\''.
The problem was the "early commit" decision that Antlr3's lexer makes --
it rears its ugly head in the float vs. range issue, as well. In short,
since Slash ('\\') is singled out, the lexer is generated such that the
only possible sequence accepted will be Slash-Q. The '.' anychar option
is ignored, so that \t becomes a lexer error.
I'm sure this is a design feature, and Antlr is working exactly as
designed, and this really is the way it is supposed to be, blah blah blah.
But if anyone else needs to deal with it, I thought I'd point out the
workaround:
Match this:
Qlit_Single: Q (options {greedy=false;}: Slash . | .)* Q;
The same premature ejaculation now works in favor of the token -
matching Slash . means that whatever character follows a slash will be
gobbled up, and not interfere with the matching. In my case, that means
that Slash Q gets both the slash and the single quote into the string,
and Slash anything-else goes in as well. Postprocessing is fairly easy.
=Austin
More information about the antlr-interest
mailing list