[antlr-interest] Re: Unicode support
jean-claude.meilland at experian-scorex.com
Fri May 21 06:44:50 PDT 2004
Yes my only requirement is to parse "<asian characters here>".
The language I parse has strings which can contain unicode
characters. But the language itself doesnt need to be in unicode.
I would be happy to have any suggestions on how to do this
(explainations or example).
Thanks in advance,
--- In antlr-interest at yahoogroups.com, Mark Lentczner <markl at g...>
> > I have to generate in C++ and I will need it to parse strings with
> > asian languages. So I guess I need some pretty efficiant unicode
> > support.
> I'm not clear here. Is your requirement that you have to parse
> constructs like:
> "<asian characters here>"
> Where the only non-US-ASCII characters appear between quotes? And
> the only restriction between those quotes is that it is a sequence
> vaild Unicode characters? If so, this is easily doable in with
> in C++, if you take the treat your input as UTF-8.
> If you need to support identifiers composed of non-US-ASCII
> it is a bit more difficult, but still doable.
> This is exactly what I'm doing: My language is defined over the
> Unicode character set, allows Unicode in string literals, comments,
> identifier names, and in a few cases operators (such as the U+F7,
> division sign). I lex and parse the language with Antlr,
> C++ lexer that accepts a UTF-8 encoded Unicode stream of bytes.
> I'd be happy to share my work on this.
> > Hope 3.0 will be out before end of summer because that's my dead
> I think the time frame is longer than that.
> - Mark
Yahoo! Groups Links
<*> To visit your group on the web, go to:
<*> To unsubscribe from this group, send an email to:
antlr-interest-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
More information about the antlr-interest