[antlr-interest] Re: proposal for 2.7.4: charVocabulary defaults to ascii 1..127

Brian L. Smith brian-l-smith at uiowa.edu
Sat May 1 14:21:47 PDT 2004


Oliver wrote:
 > lgcraymer wrote:

>>charVocabulary = "unicode";

>How would that look like? UTF-8? UTF-16? Something else?


UTF-8 decoding is done before the lexer even sees the characters, at 
least in Java, so UTF-8 wouldn't make sense. Since ANTLR 2.x uses the 
"char" type for everything in Java, it still has to deal with UTF-16 
decoding. (is ANTLR 3 going to be able to handle full Unicode 4.0?)  So, 
presumably charVocabulary="unicode" would mean UTF-16, at least in Java.

- Brian


 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/
 



More information about the antlr-interest mailing list