[antlr-interest] Re: Is there an ANTLR trick/hack to specify
"NEWLINE or EOF" in Lexer
Brian Smith
brian-l-smith at uiowa.edu
Wed Feb 5 13:48:33 PST 2003
Terence Parr wrote:
> On Tuesday, February 4, 2003, at 04:39 PM, micheal_jor
> <open.zone at virgin.net> wrote:
> Hmm....yeah, I'm not sure. What character would it be? We already use
> (char)-1 in Java, which I think is wrong since 0xFFFF is a valid char
> in some script. Any unicode geniuses out there?
0xFFFF is not an assigned Unicode codepoint. It seems that the intent of
the Unicode concortium is to have this always be the case, so that the
two's complement representation of -1 (e.g. 16-bit 0xFFFF, 32-bit
0xFFFFFFFF, 64-bit 0xFFFFFFFFFFFFFFFF, etc.) never represents a character.
Actually, I believe the two-byte sequence 0xFF 0xFF is not valid UTF-8,
UCS-2, UTF-16, UTF-32, or UCS-4, but I am not 100% sure about the 32-bit
representations.
http://oss.software.ibm.com/cgi-bin/icu/ub/utf-8/?scr=86&b=0
- Brian
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list