[antlr-interest] Re: unicode support
micheal_jor <open.zone at virgin.net>
open.zone at virgin.net
Tue Dec 17 07:45:58 PST 2002
--- In antlr-interest at yahoogroups.com, Pete Forman <pete.forman at w...>
wrote:
> At 2002-12-16 14:51 -0800, Terence Parr wrote:
> >I can convert a table to Java with a shell script probably if we
can
> >find a convenient table.
>
> http://www.unicode.org/Public/UNIDATA/ReadMe.txt
>
> That is for the current version, i.e. Unicode 3.2. You might wish
to
> stick at version 3.0 which is the last 16 bit version. Current
> Unicode uses 21 bits but Java does not grok it.
What worked for me in the past:
I imported the http://www.unicode.org/Public/3.1-Update/UnicodeData-
3.1.0.txt text file into a database and wrote simple queries to dump
the list of char-values and char-ranges for each UnicodeCategory. I
used MS SQL Server and MS Access as a prototyping-friendly front end
to write all the queries/formatting code.
In any case, this strategy should work with other RDBMSes as long as
what you want is the char-values and char-ranges of the
UnicodeCategory-ies. Otherwise Character.getType(char ch) should tell
you what UnicodeCategory a given char belongs to.
Micheal
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list