[antlr-interest] Bug in C target's handling of string literals thatcontain escapes

Thu Jun 14 11:39:56 PDT 2007

Yes - I am aware of this one...

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Wincent Colaiuta
> Sent: Thursday, June 14, 2007 10:15 AM
> To: ANTLR mail-list
> Subject: [antlr-interest] Bug in C target's handling of string
literals
> thatcontain escapes
> 
> Given a lexer rule like:
> 
>    FOO : '\'bar\'';
> 
> The C target will produce an array like the following:
> 
>    static ANTLR3_UCHAR     lit_1[]  = { 0x5c, 0x27, 0x62, 0x61, 0x72,
> 0x5c, 0x27,  ANTLR3_STRING_TERMINATOR};
> 
> Notice how each escaped single-quote is encoded as 0x5c, 0x27... As a
> result when the lexer encounters a string like 'bar' it won't match
> even though the grammar states that it should (the matchs() lexer
> function just compares strings character by character, so the encoded
> escape sequence has no special meaning to it).
> 
> Correctly encoded the array would look like this (no embedded
escapes):
> 
>    static ANTLR3_UCHAR     lit_1[]  = { 0x27, 0x62, 0x61, 0x72,
> 0x27,  ANTLR3_STRING_TERMINATOR};
> 
> As a workaround, the lexer rule can be rewritten so that escaped
> characters appear as individual characters and not as part of multi-
> character strings; for example:
> 
>    FOO : '\'' 'bar' '\'';
> 
> Cheers,
> Wincent