[antlr-interest] Re: Unicode support
Mark Lentczner
markl at glyphic.com
Wed May 19 08:35:35 PDT 2004
> I have to generate in C++ and I will need it to parse strings with
> asian languages. So I guess I need some pretty efficiant unicode
> support.
I'm not clear here. Is your requirement that you have to parse
constructs like:
"<asian characters here>"
Where the only non-US-ASCII characters appear between quotes? And that
the only restriction between those quotes is that it is a sequence of
vaild Unicode characters? If so, this is easily doable in with Antlr
in C++, if you take the treat your input as UTF-8.
If you need to support identifiers composed of non-US-ASCII characters,
it is a bit more difficult, but still doable.
This is exactly what I'm doing: My language is defined over the full
Unicode character set, allows Unicode in string literals, comments,
identifier names, and in a few cases operators (such as the U+F7, the
division sign). I lex and parse the language with Antlr, generating a
C++ lexer that accepts a UTF-8 encoded Unicode stream of bytes.
I'd be happy to share my work on this.
> Hope 3.0 will be out before end of summer because that's my dead line.
I think the time frame is longer than that.
- Mark
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/antlr-interest/
<*> To unsubscribe from this group, send an email to:
antlr-interest-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list