[antlr-interest] Re: Problems with Unicode support in ANTLR

Thu May 16 18:29:20 PDT 2002

> Okay, I see what you are talking about. Java's Character class does 
have 
> support for some catagories; see 
> http://java.sun.com/j2se/1.4/docs/api/java/lang/Character.html
> 
> Please look at the listed catagories and let me know if it is too 
> limited. In particular, java.lang.Character.getType(), and the 
static 
> final catagory constants.

I saw the static constants but could see that they were used 
anywhere. Not surprisingly, I don't believe someone actually 
thought "getType()" makes sense as the accessor for a character's 
Unicode General Category -- what happened to getCategory() or 
getGeneralCategory()?. Sheez!

In any case, you are right that the feature is supported.

> I would rather not have my Unicode-parsing application depend on 
IBM's 
> library since I would have to distribute it. I think that the 
> java.lang.Character class's support is sufficient.

For the feature we've discussed fo far, yes it is. The license for 
IBM's package doesn't forbid extracting what we need into ANTLR if 
memory serves.

> Presumably, the modified ANTLR would then generate code like this:
>      int type = Character.getType(LA(1));
>      switch (type) {
>         case Character.END_PUNCTUATION:
>              mRULE(true);
>              theRetToken=_returnToken;
>              break;
>         ....
>      }
> 

Erm....Terrence are you there?  ;-)

Micheal

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/