[antlr-interest] Re: Is there an ANTLR trick/hack to specify "NEWLINE or EOF" in Lexer

Brian Smith brian-l-smith at uiowa.edu
Wed Feb 5 13:48:33 PST 2003


Terence Parr wrote:
> On Tuesday, February 4, 2003, at 04:39 PM, micheal_jor 
> <open.zone at virgin.net> wrote:

> Hmm....yeah, I'm not sure.  What character would it be?  We already use 
> (char)-1 in Java, which I think is wrong since 0xFFFF is a valid char 
> in some script.  Any unicode geniuses out there?

0xFFFF is not an assigned Unicode codepoint. It seems that the intent of 
the Unicode concortium is to have this always be the case, so that the 
two's complement representation of -1 (e.g. 16-bit 0xFFFF, 32-bit 
0xFFFFFFFF, 64-bit 0xFFFFFFFFFFFFFFFF, etc.) never represents a character.

Actually, I believe the two-byte sequence 0xFF 0xFF is not valid UTF-8, 
UCS-2, UTF-16, UTF-32, or UCS-4, but I am not 100% sure about the 32-bit 
representations.

http://oss.software.ibm.com/cgi-bin/icu/ub/utf-8/?scr=86&b=0

- Brian


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list