[antlr-interest] unicode support
Tim Anderson
tima at intalio.com
Mon Dec 16 16:33:41 PST 2002
> From: Terence Parr [mailto:parrt at jguru.com]
>
>
> A few things that would be interesting to add:
>
> Allow you to reference sets like JAVA_IDENTIFIER or LATIN_... and then
> characters like 'GREATER-THAN SIGN' and 'APOSTROPHE-QUOTE'. The later
That would be cool. For the OpenJMS selector grammar I currently have
protected rules corresponding to the Character.isJavaIdentifierStart() and
Character.isJavaIdentifierPart() methods - being able to replace these
long (and non-obvious) rules would be great.
> would be easy: just a hashtable lookup if I can find the unicode char
> index in Java somewhere ;) The former is harder as there is nothing in
> Java's Character.java class that lets me get a set of chars for say
> GREEK_EXTENDED. Anybody know a good library that would give me a set
> of chars from these char class names? I've just found:
You could derive them using Character.UnicodeBlock, and a bit of brute
force i.e, iterate through all possible chars, invoking
UnicodeBlock.of(char),
and populate a set corresponding to the returned UnicodeBlock.
Regards,
Tim
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list