[antlr-interest] Unicode handling

Mark Lentczner markl at glyphic.com
Wed Apr 21 20:30:56 PDT 2004


> Note that 2.7.3 should do pretty well at UNICODE.  Give it a shot :)
> \uFFFE is the max valid unicode right?  -1 shouldn't be a problem 
> anymore.
No, it is not.  U+10FFFF is (Since Unicode 3.1).  Yup, 21 bits.  And 
XML 1.0 is defined in light of this, so to properly handle anything 
coming from XML, one should handle it.

Note that Java is broken in this regard.  See 
http://weblogs.java.net/pub/wlg/1202 for a discussion.  I understand 
that some XML tools in Java go to great lengths to get around the 
problem.

> Oh.  I think C++ doesn't handle UNICODE yet, but I'll let Ric answer 
> this ;)
And indeed, I'm generating C++...

	- Mark



 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/
 



More information about the antlr-interest mailing list