[antlr-interest] unicode strings using supplemental char range

Terence Parr parrt at cs.usfca.edu
Thu Jun 24 16:26:42 PDT 2004


On Jun 24, 2004, at 3:20 PM, Mark Lentczner wrote:
>> Actually, I just had an idea.  First, thanks to your help, I know that
>> UTF-16 encoded in a string is unambiguously UTF-16.  Now, the only
>> question is, how do we match a 21-bit char against it?  What if we 
>> just
>> specified that the input must be UTF-16 also?  Then, ANTLR can pretend
>> everything is 16 bits, right?
> Well, as you pointed out, this is like my hack of lexing UTF-8 for my
> parsers in C++.  Operative word is HACK.  The other problem is that
> this will fall apart as soon as you want to put in the other cool
> Unicode class based checkes (isIdentifierStart, isLowerCase, etc...).

Well, I was going to say that UTF-16 is the way I'll leave until you 
said this last thing.  isLowerCase, for example, simply won't work if 
we have UTF-16 strings.  I'll have to take your word for it that real 
languages will use codes above 16 bits, btw. ;)

> Sorry Terrence, suck it up and change all Strings to UnicodeArray which

Shite.  Rats.  Argh!  That means I'm back to the days of C/C++ where I 
have to define String.  Crap.  Anybody have any idea what the speed hit 
will be for us LATIN encoded people?

> is a class wrapper around int[].  Better yet, make it a protocol, and
> then supply implementations that scan over String, over int[], and
> perhaps over UTF-8 encoded byte[]...

;)

Thanks a bunch for the clarifications...

Ter
--
CS Professor & Grad Director, University of San Francisco
Creator, ANTLR Parser Generator, http://www.antlr.org
Cofounder, http://www.jguru.com
Cofounder, http://www.knowspam.net enjoy email again!
Cofounder, http://www.peerscope.com pure link sharing





 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/
 



More information about the antlr-interest mailing list