[antlr-interest] We Need UTF-8 String Literal Support for C Parser/Lexer

Jim Idle jimi at temporal-wave.com
Tue Jul 17 14:48:27 PDT 2012


You need to run version 3.4 and set the input encoding to UTF8.

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of banmate6 at aol.com
> Sent: Tuesday, July 17, 2012 2:44 PM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] We Need UTF-8 String Literal Support for C
> Parser/Lexer
>
>
>
>
> Hello Folks
>
> I have a basic expression grammar that specifies the production of
> abstract syntax trees (AST) of:
>
> column (this represents a database column)
>
> functions
>
> Boolean: "and", "or", "not"
>
> Equality/Relational: "=", "!=", ">", etc
> Arithmetic: "+", "-", etc
>
>
> literals
>
> int, float
> string
>
>
> In our case, we have an expression of the following form, taken from a
> tag in an XML document using the TinyXML C API.
>
>
>
>     col1 = "UTF-8 string"
>
>
> The AST looks as so, as might be expected:
>
>
> relational node, "=" function
>     child node 1, column
>     child node 2, literal string
>
>
> Unfortunately, the literal string in child node 2 is incorrectly a 4
> byte string, when in the original UTF-8 it is 6 bytes. We are not sure
> if TinyXML is mishandling the UTF-8 literal or if it is ANTLR.
> We will do more testing to find out.
>
>
> However, does anybody have suggestions in advance that might explain
> this? Does ANTLR generating C code in fact handle string literals of
> UTF-8 in this context? Is there something I must do in order to handle
> UTF-8?
>
> For your information, the version of ANTLR we are using came from
> "libantlr3c-3.2.tar". I am not sure if this version handles UTF-8.
> Again, any advice or insight is appreciated.
>
>
>
> Cheers, Mate
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address


More information about the antlr-interest mailing list