[antlr-interest] Re: proposal for 2.7.4: charVocabulary defaults
to ascii 1..127
Oliver Zeigermann
oliver at zeigermann.de
Sat May 1 14:52:06 PDT 2004
Mike Lischke wrote:
>>How would that look like? UTF-8? UTF-16? Something else?
>
>
> This describes the transformation format not the available character range.
Right. Sorry, I mixed this up...
> would also leave alone surrogates (taking two UTF-16 code points to form one UTF-32 character). This should be the
> responsibility of the grammar writer. Supporting UTF-32 is overkill at the time being, but UTF-16 (as the most common
> Unicode representation) is quite common meanwhile. UTF-8 is mainly a transport format and should be converted to UTF-16
> before parsing (unless certain circumstances don't allow this as we saw recently).
Now you seem to mix something up. Both UTF-16 and UTF-32 are character
encodings as well, just as UTF-8. All of them are converted to
characters before parsing.
Oliver
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/antlr-interest/
<*> To unsubscribe from this group, send an email to:
antlr-interest-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list