[antlr-interest] unicode strings using supplemental char range
Terence Parr
parrt at cs.usfca.edu
Thu Jun 24 16:26:42 PDT 2004
On Jun 24, 2004, at 3:20 PM, Mark Lentczner wrote:
>> Actually, I just had an idea. First, thanks to your help, I know that
>> UTF-16 encoded in a string is unambiguously UTF-16. Now, the only
>> question is, how do we match a 21-bit char against it? What if we
>> just
>> specified that the input must be UTF-16 also? Then, ANTLR can pretend
>> everything is 16 bits, right?
> Well, as you pointed out, this is like my hack of lexing UTF-8 for my
> parsers in C++. Operative word is HACK. The other problem is that
> this will fall apart as soon as you want to put in the other cool
> Unicode class based checkes (isIdentifierStart, isLowerCase, etc...).
Well, I was going to say that UTF-16 is the way I'll leave until you
said this last thing. isLowerCase, for example, simply won't work if
we have UTF-16 strings. I'll have to take your word for it that real
languages will use codes above 16 bits, btw. ;)
> Sorry Terrence, suck it up and change all Strings to UnicodeArray which
Shite. Rats. Argh! That means I'm back to the days of C/C++ where I
have to define String. Crap. Anybody have any idea what the speed hit
will be for us LATIN encoded people?
> is a class wrapper around int[]. Better yet, make it a protocol, and
> then supply implementations that scan over String, over int[], and
> perhaps over UTF-8 encoded byte[]...
;)
Thanks a bunch for the clarifications...
Ter
--
CS Professor & Grad Director, University of San Francisco
Creator, ANTLR Parser Generator, http://www.antlr.org
Cofounder, http://www.jguru.com
Cofounder, http://www.knowspam.net enjoy email again!
Cofounder, http://www.peerscope.com pure link sharing
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/antlr-interest/
<*> To unsubscribe from this group, send an email to:
antlr-interest-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list