[antlr-interest] Re: Problems with Unicode support in ANTLR

Thu May 16 23:07:47 PDT 2002

I decided I would have trouble enumerating the characters that would never
appear in Ids
( given japanese is included there are probably less the 32000 to excluded
which still leave bitset problem).

So I put them all in.
matthew
----- Original Message -----
From: "micheal_jor" <open.zone at virgin.net>
To: <antlr-interest at yahoogroups.com>
Sent: Friday, May 17, 2002 3:59 PM
Subject: [antlr-interest] Re: Problems with Unicode support in ANTLR

> --- In antlr-interest at y..., "Matthew Ford" <Matthew.Ford at f...> wrote:
> > This approach would not work for me as I need
> >
> > IDENT
> >  options {testLiterals=true;
> >      paraphrase = "an identifier";}
> >  : ('a'..'z'|'_'|'$'|'\u0080'..'\uFFFE')
> > ('a'..'z'|'_'|'0'..'9'|'$'|'\u0080'..'\uFFFE')*
> >  ;
> >
> > So rather then sub-blocks, what I need is an efficient compression
> method to
> > store these bitsets in the Antlr.
>
> The \00800..\uFFFE range might be overkill as many characters in that
> range would not [normally] be usable as parts of an IDENT.
>
> You are right that more efficient BitSet representation are needed
> for ANTLR's Unicode support in general.
>
> Micheal
>
>
>
>
>
>
> Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
>
>

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/