[antlr-interest] Unicode XID_Start/XID_Continue? (And other Unicode properties)

David Holroyd dave at badgers-in-foil.co.uk
Sat Jul 5 12:28:37 PDT 2008


On Sat, Jul 05, 2008 at 06:37:32PM +0200, Joe wrote:
> Are Unicode properties supported by Antlr in any way? It would be nice 
> to be able to simply lex unicode identifiers as ID : XID_Start 
> XID_Continue*
> Or would I have to write a script that creates the appropriate lexer 
> fragments from 
> http://unicode.org/Public/UNIDATA/DerivedCoreProperties.txt ?

Here's what I hacked up to do something like that, using ICU4J,

  http://lists.badgers-in-foil.co.uk/pipermail/metaas-dev/attachments/20070307/abfef6e7/UnicodeIdentifierGenerator.java

I think the ICU UCharacter[1] class would allow codepoints to be tested
against the XID* properties[2] in the same way, if the script doesn't
already do what you want.

[1] http://www.icu-project.org/apiref/icu4j/com/ibm/icu/lang/UCharacter.html
[2] http://www.icu-project.org/apiref/icu4j/com/ibm/icu/lang/UProperty.html#XID_CONTINUE


ta,
dave

-- 
http://david.holroyd.me.uk/


More information about the antlr-interest mailing list