[antlr-interest] Re: proposal for 2.7.4: charVocabulary defaults to ascii 1..127

Oliver Zeigermann oliver at zeigermann.de
Mon May 3 10:11:06 PDT 2004


Oooops! Again I was wrong :(

Brian, thanks for the enlightening pointers :)

Oliver

Brian Smith wrote:
> Oliver Zeigermann wrote:
> 
> 
>>Mike Lischke wrote:
>>
>> 
>>
>>
>>>>Now you seem to mix something up. Both UTF-16 and UTF-32 are 
>>>>character encodings as well, just as UTF-8. All of them are 
>>>>converted to characters before parsing.
>>>>     
>>>>
>>>
>>>Sure, but how is the internal representation? Actually, it is UTF-16. So although it is a transformation format it is
>>>also the actual character representation. Hence UTF-16 (as well as UTF-32) can be processed directly. UTF-8 has to be
>>>converted first to one of these formats (usually, at least). This is what I meant.
>>>   
>>>
>>
>>What the internal representation is, you simply do not know and there is 
>>also no need to know. Certainly, it is not UTF-16 as it only allows for 
>>64K characters which is far to little.
>>
>> 
>>
> 
> Oliver,
> 
> In ANTLR for Java, you do know the representation and for some 
> applications is it important. It is a 16-bit integer described by the 
> 'char' type. For JRE 1.2-1.4, 'char' is a 16-bit Unicode code point. 
> (Unicode 1.x - 3.x depending on the JRE version). In JRE 1.5, 'char' is 
> redefined to be a 16-bit Unicode 4.0 code unit, that may represent 
> either a whole character (code point), or a partial character that needs 
> to be combined with an adjacent one  according to the UTF-16 
> transformation rules. See http://weblogs.java.net/pub/wlg/1202 and the 
> documents it references.
> 
> IMO, in order to fully support Unicode 4.0, ANTLR (for Java) would need 
> to replace all usages of 'char' with 'java.lang.String' or 'int.'
> 
> - Brian
> 
> 
> 
> 
>  
> Yahoo! Groups Links
> 
> 
> 
>  
> 
> 



 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/
 



More information about the antlr-interest mailing list