[antlr-interest] Unicode XID_Start/XID_Continue? (And other Unicode properties)
David Holroyd
dave at badgers-in-foil.co.uk
Sat Jul 5 12:28:37 PDT 2008
On Sat, Jul 05, 2008 at 06:37:32PM +0200, Joe wrote:
> Are Unicode properties supported by Antlr in any way? It would be nice
> to be able to simply lex unicode identifiers as ID : XID_Start
> XID_Continue*
> Or would I have to write a script that creates the appropriate lexer
> fragments from
> http://unicode.org/Public/UNIDATA/DerivedCoreProperties.txt ?
Here's what I hacked up to do something like that, using ICU4J,
http://lists.badgers-in-foil.co.uk/pipermail/metaas-dev/attachments/20070307/abfef6e7/UnicodeIdentifierGenerator.java
I think the ICU UCharacter[1] class would allow codepoints to be tested
against the XID* properties[2] in the same way, if the script doesn't
already do what you want.
[1] http://www.icu-project.org/apiref/icu4j/com/ibm/icu/lang/UCharacter.html
[2] http://www.icu-project.org/apiref/icu4j/com/ibm/icu/lang/UProperty.html#XID_CONTINUE
ta,
dave
--
http://david.holroyd.me.uk/
More information about the antlr-interest
mailing list