[antlr-interest] Bug in C target's handling of string literals thatcontain escapes
Jim Idle
jimi at temporal-wave.com
Thu Jun 14 11:39:56 PDT 2007
Yes - I am aware of this one...
Jim
> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Wincent Colaiuta
> Sent: Thursday, June 14, 2007 10:15 AM
> To: ANTLR mail-list
> Subject: [antlr-interest] Bug in C target's handling of string
literals
> thatcontain escapes
>
> Given a lexer rule like:
>
> FOO : '\'bar\'';
>
> The C target will produce an array like the following:
>
> static ANTLR3_UCHAR lit_1[] = { 0x5c, 0x27, 0x62, 0x61, 0x72,
> 0x5c, 0x27, ANTLR3_STRING_TERMINATOR};
>
> Notice how each escaped single-quote is encoded as 0x5c, 0x27... As a
> result when the lexer encounters a string like 'bar' it won't match
> even though the grammar states that it should (the matchs() lexer
> function just compares strings character by character, so the encoded
> escape sequence has no special meaning to it).
>
> Correctly encoded the array would look like this (no embedded
escapes):
>
> static ANTLR3_UCHAR lit_1[] = { 0x27, 0x62, 0x61, 0x72,
> 0x27, ANTLR3_STRING_TERMINATOR};
>
> As a workaround, the lexer rule can be rewritten so that escaped
> characters appear as individual characters and not as part of multi-
> character strings; for example:
>
> FOO : '\'' 'bar' '\'';
>
> Cheers,
> Wincent
More information about the antlr-interest
mailing list