[antlr-interest] UTF-8 input?

Jim Idle jimi at temporal-wave.com
Wed Jan 20 08:30:47 PST 2010


You need to remember to state which target you are talking about.

I have written a new universal input stream for the next version of the C runtime. It takes 8bit, 16 bit, UTF-8, UTF-16, UCS2, UTF32 and EBCDIC (code gen will change slightly to support this). It is not well tested right now but will be available as a snapshot 3.3 release shortly in the downloads page.

In the meantime the easiest thing to do is to convert to UCS2 using the supplied converter in the current runtime. Though this will not work with surrogate pairs in UTF-16 though but most people do not need that.

If you really need UTf-8 without conversion then it is easy enough to write, or you can just steal the code from my check in of the code in about 10 minutes. Note that while the streams work, I have not provided ANTLR3_STRING support for UTF-8 and so on yet and so getting $text from such a stream may or may not work,

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Xie, Linlin
> Sent: Wednesday, January 20, 2010 3:32 AM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] UTF-8 input?
> 
> Can anyone tell me if antlr3.1.3 generated parser works with UTF-8
> input? If it does, how should I configure in the grammar? I noticed
> there are two macros ANTLR3_INLINE_INPUT_ASCII and
> ANTLR3_INLINE_INPUT_UTF16, but no UTF-8 one.
> 
> 
> 
> Many thanks!
> 
> Linlin
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address





More information about the antlr-interest mailing list