[antlr-interest] antlr 3 unicode support

Ric Klaren klaren at cs.utwente.nl
Sat Nov 22 06:56:57 PST 2003


Hi,

On Fri, Nov 21, 2003 at 01:06:14AM -0500, Tom Moog wrote:
> Just to remind you java guys thinking about lexing unicode:
> unicode doesn't stop at 2**16.  It extends up to 0x10ffff if you
> want to include music symbols, Babylonian, math style letters,
> and so on.  For java xml parsers this means using the low order
> ten bits from two adjacent 16 bit words (surrogate pairs) to
> reach things above 2**16.

Hmmm this would mean that we would have to deal with unicode decoding
ourselves in the lexer? And use 32 bit values for the tokens/strings.

So far for C++ I was only looking at making the backend wchar/wstring
aware, although extending up from there would not be hard with templates.

Cheers,

Ric
-- 
-----+++++*****************************************************+++++++++-------
    ---- Ric Klaren ----- j.klaren at utwente.nl ----- +31 53 4893722  ----
-----+++++*****************************************************+++++++++-------
 Why don't we just invite them to dinner and massacre them all when they're
  drunk? You heard the man. There's seven hundred thousand of them. Ah? ..
           So it'd have to be something simple with pasta, then.
                 From: Interesting Times by Terry Pratchet


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 




More information about the antlr-interest mailing list