[antlr-interest] unicode 16bit versus new 21bit stuff

Ric Klaren klaren at cs.utwente.nl
Mon Jun 21 02:53:01 PDT 2004


Hi,

On Fri, Jun 18, 2004 at 05:49:57PM -0700, Terence Parr wrote:
> My analysis algorithms use pure int so there is no trouble with that,
> however, I do encode token types in the upper 16 bits of a 32 int and
> have all chars in the lower 16 bits.  This is purely programming
> convenience as I know how to print out a token type by it's value
> range.

Erm.. sorry.. erm but why? Why use some C-style encoding scheme from an
ancient time where a struct (or class in java) can do the trick and give
you the value added bonus that a compiler (byte code compiler?) that's
smart enough can encode the struct in some way for you and you don't have
to bother about the shifting yourself (and e.g. use a 64-bit value when
it's possible). Well anyway if it's only in the analysis I don't care, if
it ends up in the support lib I'd be somewhat carefull in doing 'hacks'
like this...

> I don't want to go to 64-bit ints as most CPUs are still 32bits
> natively.  If I use 21-bit unicode values, that leaves 2^11 or 2048
> token type values, which makes me a bit nervous.

I would not welcome going 64 bits on the C++ side. This is all pretty new
tech and I already saw some things float about in bugtrackers here and
there about 64 bit issues with some compilers/support libraries.
Furthermore it probably slows down 32 bitters (current majority of systems
I don't expect 64 bit going to be mainstream soon, there's very little
benefit to it as of yet).

> I want to do unicode "right" this time.  Anybody have a strong opinion
> about the new supplemental (beyond 16bit unicode) char values and/or
> whether 2048 is a serious token type limitation?

Why use a data-structure that limits you from the outset??? Is a
struct/class that much slower in analysis that you're using an encoding
scheme like this. (I'd expect seeing this kindoff stuff in old C programs
where people are trying to squeeze extra cycles/memory out of something)
Remember premature optimization is the root of all evil....

Cheers,

Ric
--
-----+++++*****************************************************+++++++++-------
    ---- Ric Klaren ----- j.klaren at utwente.nl ----- +31 53 4893755  ----
-----+++++*****************************************************+++++++++-------
 Time what is time - I wish I knew how to tell You why - It hurts to know -
          Aren't we machines - Time what is time - Unlock the door
               - And see the truth - Then time is time again
                From: 'Time what is Time' by Blind Guardian


 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/
 



More information about the antlr-interest mailing list