[antlr-interest] Recognizing 5-th hex digit

David-Sarah Hopwood david-sarah at jacaranda.org
Wed Aug 26 17:42:53 PDT 2009


Kieran Beltran wrote:
> Sam / Gavin thank-you.
> 
> So, in the case I am receiving UTF-32 input, I would need to preprocess
> (using UTF-32-->UTF-16 algorithim) for characters in the 10000 to 10FFFF
> ranges and convert them into surrogate pairs, passing that input to
> ANTLRInputStream.

Or just use 'new ANTLRInputStream(inputstream, "UTF-32")' (which converts
UTF-32 to UTF-16, at least in the current version of ANTLR).

> In my lexer definition, where appropriate, I would define the tokens to
> recognize the surrogate pairs for example:
> 
> fragment ARITHMOS: '\uD835\uDD38'; // recognize UTF-32 (0001 D538) arithmos
> fragment FINSET: '\uD835\uDD3D';      // recognize UTF-32 (0001 D53D) finite
> set
> 
> As indicated this is only for a Java targets.
> 
> Have I got it right?

Yes.

-- 
David-Sarah Hopwood  ⚥  http://davidsarah.livejournal.com



More information about the antlr-interest mailing list