[antlr-interest] UTF-8 input?

Xie, Linlin linlin.xie at siemens.com
Fri Jan 22 04:57:40 PST 2010


Hi jim,

Thanks for the reply. You said I can convert my UTF8 input "to UCS2
using the supplied converter in the current runtime", but I can't find
any such converter in antlr c runtime. Can you suggest me which API to
use? Btw, I searched the archive, I can see the person who had similar
problem as mine used iconv library on linux. 

Thanks in advance!
Linlin


-----Original Message-----
From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Jim Idle
Sent: 20 January 2010 16:31
To: antlr-interest at antlr.org
Subject: Re: [antlr-interest] UTF-8 input?

You need to remember to state which target you are talking about.

I have written a new universal input stream for the next version of the
C runtime. It takes 8bit, 16 bit, UTF-8, UTF-16, UCS2, UTF32 and EBCDIC
(code gen will change slightly to support this). It is not well tested
right now but will be available as a snapshot 3.3 release shortly in the
downloads page.

In the meantime the easiest thing to do is to convert to UCS2 using the
supplied converter in the current runtime. Though this will not work
with surrogate pairs in UTF-16 though but most people do not need that.

If you really need UTf-8 without conversion then it is easy enough to
write, or you can just steal the code from my check in of the code in
about 10 minutes. Note that while the streams work, I have not provided
ANTLR3_STRING support for UTF-8 and so on yet and so getting $text from
such a stream may or may not work,

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Xie, Linlin
> Sent: Wednesday, January 20, 2010 3:32 AM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] UTF-8 input?
> 
> Can anyone tell me if antlr3.1.3 generated parser works with UTF-8
> input? If it does, how should I configure in the grammar? I noticed
> there are two macros ANTLR3_INLINE_INPUT_ASCII and
> ANTLR3_INLINE_INPUT_UTF16, but no UTF-8 one.
> 
> 
> 
> Many thanks!
> 
> Linlin
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address




List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-address


More information about the antlr-interest mailing list