[antlr-interest] antlr 3 unicode support

Tom Moog tmoog at polhode.com
Thu Nov 20 22:06:14 PST 2003



Just to remind you java guys thinking about lexing unicode:
unicode doesn't stop at 2**16.  It extends up to 0x10ffff if you
want to include music symbols, Babylonian, math style letters,
and so on.  For java xml parsers this means using the low order
ten bits from two adjacent 16 bit words (surrogate pairs) to
reach things above 2**16.

 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 




More information about the antlr-interest mailing list