[antlr-interest] unicode support

Tim Anderson tima at intalio.com
Mon Dec 16 16:33:41 PST 2002


> From: Terence Parr [mailto:parrt at jguru.com]
>
>
> A few things that would be interesting to add:
>
> Allow you to reference sets like JAVA_IDENTIFIER or LATIN_... and then
> characters like 'GREATER-THAN SIGN' and 'APOSTROPHE-QUOTE'.  The later

That would be cool. For the OpenJMS selector grammar I currently have
protected rules corresponding to the Character.isJavaIdentifierStart() and
Character.isJavaIdentifierPart() methods - being able to replace these
long (and non-obvious) rules would be great.

> would be easy: just a hashtable lookup if I can find the unicode char
> index in Java somewhere ;)  The former is harder as there is nothing in
> Java's Character.java class that lets me get a set of chars for say
> GREEK_EXTENDED.  Anybody know a good library that would give me a set
> of chars from these char class names?  I've just found:

You could derive them using Character.UnicodeBlock, and a bit of brute
force i.e, iterate through all possible chars, invoking
UnicodeBlock.of(char),
and populate a set corresponding to the returned UnicodeBlock.

Regards,

Tim


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list