[antlr-interest] unicode 16bit versus new 21bit stuff

Sun Jun 20 11:55:55 PDT 2004

On Jun 20, 2004, at 11:37 AM, John D. Mitchell wrote:

>>>>>> "Terence" == Terence Parr <parrt at cs.usfca.edu> writes:
>>>>>>> On Jun 19, 2004, at 3:36 PM, Mark Lentczner writes:
> [...]
>
>>> Seems to me that you can still encode chars and tokens in the same 32
>>> bit int: any value <= 0x10FFFF is Unicode any value > 0x10FFFF is a
>>> Token type
>
>>> Or am I missing something?
>
>> Heh, you're right.  I was focused on only 11 bits left, but if I 
>> treat it
>> as a 32-bit int not 2 smaller ints, then the values work out great!.  
>> We
>> have 0x10FFFF+1 .. 0xFFFFFFFF to mess with.  That's um...lots. ;) 
>> Thanks!
>
> Hmm... Is my senility setting in?  I thought I recalled that you had 
> some
> reason you needed them separated?
>
> If not then rock on.

Well, they are still separated by < or > operator.  For a second there, 
I thought I was masking out the upper 16 bits and then treating as a 
short 11-bit int, but I'm not.  I'm doing a simple comparison, which 
leaves like 2^11 * unicode-size token types. :)

Ter
--
CS Professor & Grad Director, University of San Francisco
Creator, ANTLR Parser Generator, http://www.antlr.org
Cofounder, http://www.jguru.com
Cofounder, http://www.knowspam.net enjoy email again!
Cofounder, http://www.peerscope.com pure link sharing

Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/