[antlr-interest] Unicode XID_Start/XID_Continue? (And, other Unicode properties)

Gavin Lambert antlr at mirality.co.nz
Sat Jul 5 15:59:48 PDT 2008


At 10:47 6/07/2008, Joe wrote:
 >So they are unsupported. And apparently UTF-16 isn't even really 

 >supported. Shouldn't this stuff be fairly easy to implement? The 

 >java version of LA already returns an int, so why not add UTF-16 

 >decoding to it? And properties could be implemented via ICU

While you cannot directly specify Unicode characters in an ANTLR 
grammar (because grammars are parsed by ANTLR 2, which doesn't 
understand Unicode), ANTLR 3 handles Unicode just fine.  So while 
you need to use Unicode escapes in your grammar you shouldn't have 
problems defining rules for any characters you want to recognise 
in your own lexers.



More information about the antlr-interest mailing list