[antlr-interest] Re: Is there an ANTLR trick/hack to specify "NEWLINE or EOF" in Lexer
Terence Parr
parrt at jguru.com
Wed Feb 5 14:05:47 PST 2003
Cool. Ok, so let's say EOF can be a real char. What does this mean?
NONSENSE : (EOF)+ ;
Ter
On Wednesday, February 5, 2003, at 01:48 PM, Brian Smith wrote:
> Terence Parr wrote:
>> On Tuesday, February 4, 2003, at 04:39 PM, micheal_jor
>> <open.zone at virgin.net> wrote:
>
>> Hmm....yeah, I'm not sure. What character would it be? We already
>> use
>> (char)-1 in Java, which I think is wrong since 0xFFFF is a valid char
>> in some script. Any unicode geniuses out there?
>
> 0xFFFF is not an assigned Unicode codepoint. It seems that the intent
> of
> the Unicode concortium is to have this always be the case, so that the
> two's complement representation of -1 (e.g. 16-bit 0xFFFF, 32-bit
> 0xFFFFFFFF, 64-bit 0xFFFFFFFFFFFFFFFF, etc.) never represents a
> character.
>
> Actually, I believe the two-byte sequence 0xFF 0xFF is not valid UTF-8,
> UCS-2, UTF-16, UTF-32, or UCS-4, but I am not 100% sure about the
> 32-bit
> representations.
>
> http://oss.software.ibm.com/cgi-bin/icu/ub/utf-8/?scr=86&b=0
>
> - Brian
>
>
>
>
> Your use of Yahoo! Groups is subject to
> http://docs.yahoo.com/info/terms/
>
>
--
Co-founder, http://www.jguru.com
Creator, ANTLR Parser Generator: http://www.antlr.org
Lecturer in Comp. Sci., University of San Francisco
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list