[antlr-interest] Re: proposal for 2.7.4: charVocabulary defaults to ascii 1..127

Sat May 1 14:27:49 PDT 2004

> How would that look like? UTF-8? UTF-16? Something else?

This describes the transformation format not the available character range. UTF-8 can well be UTF-32 after decoding. I
would also leave alone surrogates (taking two UTF-16 code points to form one UTF-32 character). This should be the
responsibility of the grammar writer. Supporting UTF-32 is overkill at the time being, but UTF-16 (as the most common
Unicode representation) is quite common meanwhile. UTF-8 is mainly a transport format and should be converted to UTF-16
before parsing (unless certain circumstances don't allow this as we saw recently).

Mike
--
www.soft-gems.net

Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/