[antlr-interest] Unicode handling
Mark Lentczner
markl at glyphic.com
Wed Apr 21 20:30:56 PDT 2004
> Note that 2.7.3 should do pretty well at UNICODE. Give it a shot :)
> \uFFFE is the max valid unicode right? -1 shouldn't be a problem
> anymore.
No, it is not. U+10FFFF is (Since Unicode 3.1). Yup, 21 bits. And
XML 1.0 is defined in light of this, so to properly handle anything
coming from XML, one should handle it.
Note that Java is broken in this regard. See
http://weblogs.java.net/pub/wlg/1202 for a discussion. I understand
that some XML tools in Java go to great lengths to get around the
problem.
> Oh. I think C++ doesn't handle UNICODE yet, but I'll let Ric answer
> this ;)
And indeed, I'm generating C++...
- Mark
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/antlr-interest/
<*> To unsubscribe from this group, send an email to:
antlr-interest-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list