[antlr-interest] Recognizing 5-th hex digit
Gavin Lambert
antlr at mirality.co.nz
Wed Aug 26 14:16:10 PDT 2009
At 07:35 27/08/2009, Kieran Beltran wrote:
>I have encountered a problem when attempting to recognize two
>required Standard Z symbols which are "above" the four-hex set
>recognized by my generated lexer. The two symbols are \u1D538 and
>\u1D53D.
[...]
>Is the solution to include a fifth digit to be recognized
>optionally? Could I simply replace line 495 (as below) and add a
>new fragment
>
>'u' ZDIGIT? XDIGIT XDIGIT XDIGIT XDIGIT
No. It also depends on the stream encoding. IIRC the Java target
at least reads in files as UTF-16. So there's no "room" in a
single character to store that single digit.
Instead, you need to encode it as a surrogate pair. \u1D538, for
example, would be encoded as \uD835\uDD38.
I'm not entirely sure how it works in the C target, which uses
UTF-32 encoding by default; I've never really needed to use
characters that high up.
More information about the antlr-interest
mailing list