[antlr-interest] Bug in C target's handling of string literals that contain escapes
Wincent Colaiuta
win at wincent.com
Thu Jun 14 10:15:09 PDT 2007
Given a lexer rule like:
FOO : '\'bar\'';
The C target will produce an array like the following:
static ANTLR3_UCHAR lit_1[] = { 0x5c, 0x27, 0x62, 0x61, 0x72,
0x5c, 0x27, ANTLR3_STRING_TERMINATOR};
Notice how each escaped single-quote is encoded as 0x5c, 0x27... As a
result when the lexer encounters a string like 'bar' it won't match
even though the grammar states that it should (the matchs() lexer
function just compares strings character by character, so the encoded
escape sequence has no special meaning to it).
Correctly encoded the array would look like this (no embedded escapes):
static ANTLR3_UCHAR lit_1[] = { 0x27, 0x62, 0x61, 0x72,
0x27, ANTLR3_STRING_TERMINATOR};
As a workaround, the lexer rule can be rewritten so that escaped
characters appear as individual characters and not as part of multi-
character strings; for example:
FOO : '\'' 'bar' '\'';
Cheers,
Wincent
More information about the antlr-interest
mailing list