[antlr-interest] Unicode

Gregg Reynolds dev at arabink.com
Tue Jan 31 14:00:18 PST 2006


Martin Probst wrote:
>> There are many horrible things about Windows, but this isn't one of them.
>>
>> http://www.unicode.org/faq/utf_bom.html#22
> 
> Well I know it's standardised, it's just not used on Unix AFAIK. And I
> think that's a good decision, with Unicode there is really no reason to
> have a mixed-encoding system.

BOM is not an OS issue, nor is it about mixed-encodings; it addresses
the problem of hardware variance.  It's unavoidable; the "problem" is
simply how to manage the blessing of variety.  Software that can't
handle BOM?  Well, Milton said it best:

Wherefore did Nature powre her bounties forth,
With such a full and unwithdrawing hand,
Covering the earth with odours, fruits, and flocks,
Thronging the Seas with spawn innumerable,
But all to please, and sate the curious taste?
And set to work millions of spinning Worms,
That in their green shops weave the smooth-hair'd silk
To deck her Sons, and that no corner might
Be vacant of her plenty, in her own loyns
She hutch't th' all-worshipt ore, and precious gems
To store her children with; if all the world
Should in a pet of temperance feed on Pulse,
Drink the clear stream, and nothing wear but Freize,
Th' all-giver would be unthank't, would be unprais'd,
Not half his riches known, and yet despis'd,
And we should serve him as a grudging master,
As a penurious niggard of his wealth,
And live like Natures bastards, not her sons,
Who would be quite surcharged with her own weight,
And strangl'd with her waste fertility;
Th' earth cumber'd, and the wing'd air dark't with plumes,
The herds would over-multitude their Lords,
The Sea o'refraught would swell, and th' unsought diamonds
Would so emblaze the forhead of the Deep,
And so bestudd with Stars, that they below
Would grow inur'd to light, and com at last
To gaze upon the Sun with shameless brows.

So lest we live like Technology's bastards, not her sons, we should
support all the annoying Unicode stuff like BOM and canonical forms.

;)

-gregg

P.S.  This is an interesting page on cross-platform multibyte C stuff,
from the documentation of the libmba  ("A library of generic C modules")
library:

http://www.ioplex.com/~miallen/libmba/dl/docs/ref/text_details.html


More information about the antlr-interest mailing list