[antlr-interest] Re: Is there an ANTLR trick/hack to specify "NEWLINE or EOF" in Lexer

Terence Parr parrt at jguru.com
Wed Feb 5 14:05:47 PST 2003


Cool.  Ok, so let's say EOF can be a real char.  What does this mean?

NONSENSE : (EOF)+ ;

Ter

On Wednesday, February 5, 2003, at 01:48 PM, Brian Smith wrote:

> Terence Parr wrote:
>> On Tuesday, February 4, 2003, at 04:39 PM, micheal_jor
>> <open.zone at virgin.net> wrote:
>
>> Hmm....yeah, I'm not sure.  What character would it be?  We already 
>> use
>> (char)-1 in Java, which I think is wrong since 0xFFFF is a valid char
>> in some script.  Any unicode geniuses out there?
>
> 0xFFFF is not an assigned Unicode codepoint. It seems that the intent 
> of
> the Unicode concortium is to have this always be the case, so that the
> two's complement representation of -1 (e.g. 16-bit 0xFFFF, 32-bit
> 0xFFFFFFFF, 64-bit 0xFFFFFFFFFFFFFFFF, etc.) never represents a 
> character.
>
> Actually, I believe the two-byte sequence 0xFF 0xFF is not valid UTF-8,
> UCS-2, UTF-16, UTF-32, or UCS-4, but I am not 100% sure about the 
> 32-bit
> representations.
>
> http://oss.software.ibm.com/cgi-bin/icu/ub/utf-8/?scr=86&b=0
>
> - Brian
>
>
>
>
> Your use of Yahoo! Groups is subject to 
> http://docs.yahoo.com/info/terms/
>
>
--
Co-founder, http://www.jguru.com
Creator, ANTLR Parser Generator: http://www.antlr.org
Lecturer in Comp. Sci., University of San Francisco


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list