[antlr-interest] unicode 16bit versus new 21bit stuff
Ric Klaren
klaren at cs.utwente.nl
Mon Jun 21 02:53:01 PDT 2004
Hi,
On Fri, Jun 18, 2004 at 05:49:57PM -0700, Terence Parr wrote:
> My analysis algorithms use pure int so there is no trouble with that,
> however, I do encode token types in the upper 16 bits of a 32 int and
> have all chars in the lower 16 bits. This is purely programming
> convenience as I know how to print out a token type by it's value
> range.
Erm.. sorry.. erm but why? Why use some C-style encoding scheme from an
ancient time where a struct (or class in java) can do the trick and give
you the value added bonus that a compiler (byte code compiler?) that's
smart enough can encode the struct in some way for you and you don't have
to bother about the shifting yourself (and e.g. use a 64-bit value when
it's possible). Well anyway if it's only in the analysis I don't care, if
it ends up in the support lib I'd be somewhat carefull in doing 'hacks'
like this...
> I don't want to go to 64-bit ints as most CPUs are still 32bits
> natively. If I use 21-bit unicode values, that leaves 2^11 or 2048
> token type values, which makes me a bit nervous.
I would not welcome going 64 bits on the C++ side. This is all pretty new
tech and I already saw some things float about in bugtrackers here and
there about 64 bit issues with some compilers/support libraries.
Furthermore it probably slows down 32 bitters (current majority of systems
I don't expect 64 bit going to be mainstream soon, there's very little
benefit to it as of yet).
> I want to do unicode "right" this time. Anybody have a strong opinion
> about the new supplemental (beyond 16bit unicode) char values and/or
> whether 2048 is a serious token type limitation?
Why use a data-structure that limits you from the outset??? Is a
struct/class that much slower in analysis that you're using an encoding
scheme like this. (I'd expect seeing this kindoff stuff in old C programs
where people are trying to squeeze extra cycles/memory out of something)
Remember premature optimization is the root of all evil....
Cheers,
Ric
--
-----+++++*****************************************************+++++++++-------
---- Ric Klaren ----- j.klaren at utwente.nl ----- +31 53 4893755 ----
-----+++++*****************************************************+++++++++-------
Time what is time - I wish I knew how to tell You why - It hurts to know -
Aren't we machines - Time what is time - Unlock the door
- And see the truth - Then time is time again
From: 'Time what is Time' by Blind Guardian
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/antlr-interest/
<*> To unsubscribe from this group, send an email to:
antlr-interest-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list