[antlr-interest] unicode 16bit versus new 21bit stuff
parrt at cs.usfca.edu
Sun Jun 20 11:55:55 PDT 2004
On Jun 20, 2004, at 11:37 AM, John D. Mitchell wrote:
>>>>>> "Terence" == Terence Parr <parrt at cs.usfca.edu> writes:
>>>>>>> On Jun 19, 2004, at 3:36 PM, Mark Lentczner writes:
>>> Seems to me that you can still encode chars and tokens in the same 32
>>> bit int: any value <= 0x10FFFF is Unicode any value > 0x10FFFF is a
>>> Token type
>>> Or am I missing something?
>> Heh, you're right. I was focused on only 11 bits left, but if I
>> treat it
>> as a 32-bit int not 2 smaller ints, then the values work out great!.
>> have 0x10FFFF+1 .. 0xFFFFFFFF to mess with. That's um...lots. ;)
> Hmm... Is my senility setting in? I thought I recalled that you had
> reason you needed them separated?
> If not then rock on.
Well, they are still separated by < or > operator. For a second there,
I thought I was masking out the upper 16 bits and then treating as a
short 11-bit int, but I'm not. I'm doing a simple comparison, which
leaves like 2^11 * unicode-size token types. :)
CS Professor & Grad Director, University of San Francisco
Creator, ANTLR Parser Generator, http://www.antlr.org
Cofounder, http://www.knowspam.net enjoy email again!
Cofounder, http://www.peerscope.com pure link sharing
Yahoo! Groups Links
<*> To visit your group on the web, go to:
<*> To unsubscribe from this group, send an email to:
antlr-interest-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
More information about the antlr-interest