[antlr-interest] unicode support

Tue Dec 17 01:32:09 PST 2002

On Monday, December 16, 2002, at 04:33  PM, Tim Anderson wrote:

>> From: Terence Parr [mailto:parrt at jguru.com]
>>
>>
>> A few things that would be interesting to add:
>>
>> Allow you to reference sets like JAVA_IDENTIFIER or LATIN_... and then
>> characters like 'GREATER-THAN SIGN' and 'APOSTROPHE-QUOTE'.  The later
>
> That would be cool. For the OpenJMS selector grammar I currently have
> protected rules corresponding to the Character.isJavaIdentifierStart() 
> and
> Character.isJavaIdentifierPart() methods - being able to replace these
> long (and non-obvious) rules would be great.

Yep, those obvious ones for ID and such would be easy but would it be 
worth it just to do those?

>
>> would be easy: just a hashtable lookup if I can find the unicode char
>> index in Java somewhere ;)  The former is harder as there is nothing 
>> in
>> Java's Character.java class that lets me get a set of chars for say
>> GREEK_EXTENDED.  Anybody know a good library that would give me a set
>> of chars from these char class names?  I've just found:
>
> You could derive them using Character.UnicodeBlock, and a bit of brute
> force i.e, iterate through all possible chars, invoking
> UnicodeBlock.of(char),
> and populate a set corresponding to the returned UnicodeBlock.

Again though we'd have to do it for every char class...I guess that 
ain't bad ;)

Ter
--
Co-founder, http://www.jguru.com
Creator, ANTLR Parser Generator: http://www.antlr.org
Lecturer in Comp. Sci., University of San Francisco

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/