[antlr-interest] Recognizing 5-th hex digit
David-Sarah Hopwood
david-sarah at jacaranda.org
Wed Aug 26 17:42:53 PDT 2009
Kieran Beltran wrote:
> Sam / Gavin thank-you.
>
> So, in the case I am receiving UTF-32 input, I would need to preprocess
> (using UTF-32-->UTF-16 algorithim) for characters in the 10000 to 10FFFF
> ranges and convert them into surrogate pairs, passing that input to
> ANTLRInputStream.
Or just use 'new ANTLRInputStream(inputstream, "UTF-32")' (which converts
UTF-32 to UTF-16, at least in the current version of ANTLR).
> In my lexer definition, where appropriate, I would define the tokens to
> recognize the surrogate pairs for example:
>
> fragment ARITHMOS: '\uD835\uDD38'; // recognize UTF-32 (0001 D538) arithmos
> fragment FINSET: '\uD835\uDD3D'; // recognize UTF-32 (0001 D53D) finite
> set
>
> As indicated this is only for a Java targets.
>
> Have I got it right?
Yes.
--
David-Sarah Hopwood ⚥ http://davidsarah.livejournal.com
More information about the antlr-interest
mailing list