[antlr-interest] Re: unicode support

micheal_jor <open.zone at virgin.net> open.zone at virgin.net
Tue Dec 17 07:45:58 PST 2002


--- In antlr-interest at yahoogroups.com, Pete Forman <pete.forman at w...> 
wrote:
> At 2002-12-16 14:51 -0800, Terence Parr wrote:
> >I can convert a table to Java with a shell script probably if we 
can
> >find a convenient table.
> 
> http://www.unicode.org/Public/UNIDATA/ReadMe.txt
> 
> That is for the current version, i.e. Unicode 3.2.  You might wish 
to
> stick at version 3.0 which is the last 16 bit version.  Current
> Unicode uses 21 bits but Java does not grok it.

What worked for me in the past:

I imported the http://www.unicode.org/Public/3.1-Update/UnicodeData-
3.1.0.txt text file into a database and wrote simple queries to dump 
the list of char-values and char-ranges for each UnicodeCategory. I 
used MS SQL Server and MS Access as a prototyping-friendly front end 
to write all the queries/formatting code.

In any case, this strategy should work with other RDBMSes as long as 
what you want is the char-values and char-ranges of the 
UnicodeCategory-ies. Otherwise Character.getType(char ch) should tell 
you what UnicodeCategory a given char belongs to.

Micheal



 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list