[antlr-interest] Bug in C target's handling of string literals that contain escapes

Wincent Colaiuta win at wincent.com
Thu Jun 14 10:15:09 PDT 2007


Given a lexer rule like:

   FOO : '\'bar\'';

The C target will produce an array like the following:

   static ANTLR3_UCHAR     lit_1[]  = { 0x5c, 0x27, 0x62, 0x61, 0x72,  
0x5c, 0x27,  ANTLR3_STRING_TERMINATOR};

Notice how each escaped single-quote is encoded as 0x5c, 0x27... As a  
result when the lexer encounters a string like 'bar' it won't match  
even though the grammar states that it should (the matchs() lexer  
function just compares strings character by character, so the encoded  
escape sequence has no special meaning to it).

Correctly encoded the array would look like this (no embedded escapes):

   static ANTLR3_UCHAR     lit_1[]  = { 0x27, 0x62, 0x61, 0x72,  
0x27,  ANTLR3_STRING_TERMINATOR};

As a workaround, the lexer rule can be rewritten so that escaped  
characters appear as individual characters and not as part of multi- 
character strings; for example:

   FOO : '\'' 'bar' '\'';

Cheers,
Wincent



More information about the antlr-interest mailing list