[antlr-interest] Unicode XID_Start/XID_Continue? (And, other Unicode properties)
Joe
l0calh05t at gmx.net
Sat Jul 5 15:47:47 PDT 2008
>> Are Unicode properties supported by Antlr in any way? It would be nice
>> to be able to simply lex unicode identifiers as ID : XID_Start
>> XID_Continue*
>> Or would I have to write a script that creates the appropriate lexer
>> fragments from
>> http://unicode.org/Public/UNIDATA/DerivedCoreProperties.txt ?
>>
>
> Here's what I hacked up to do something like that, using ICU4J,
>
> http://lists.badgers-in-foil.co.uk/pipermail/metaas-dev/attachments/20070307/abfef6e7/UnicodeIdentifierGenerator.java
>
> I think the ICU UCharacter[1] class would allow codepoints to be tested
> against the XID* properties[2] in the same way, if the script doesn't
> already do what you want.
>
> [1] http://www.icu-project.org/apiref/icu4j/com/ibm/icu/lang/UCharacter.html
> [2] http://www.icu-project.org/apiref/icu4j/com/ibm/icu/lang/UProperty.html#XID_CONTINUE
>
>
> ta,
> dave
>
So they are unsupported. And apparently UTF-16 isn't even really
supported. Shouldn't this stuff be fairly easy to implement? The java
version of LA already returns an int, so why not add UTF-16 decoding to
it? And properties could be implemented via ICU
--
Generally speaking, things have gone about as far
as they can possibly go, when things have gotten
about as bad as they can reasonably get.
More information about the antlr-interest
mailing list