[antlr-interest] Re: Problems with Unicode support in ANTLR
Brian Smith
brian-l-smith at uiowa.edu
Thu May 16 15:05:41 PDT 2002
micheal_jor wrote:
> --- In antlr-interest at y..., Brian Smith <brian-l-smith at u...> wrote:
>
> No Unicode blocks are a different concept from Unicode General
> Categories. I don't think Java's standard libraries support Unicode
> categories.
Okay, I see what you are talking about. Java's Character class does have
support for some catagories; see
http://java.sun.com/j2se/1.4/docs/api/java/lang/Character.html
Please look at the listed catagories and let me know if it is too
limited. In particular, java.lang.Character.getType(), and the static
final catagory constants.
>>I was thinking of patching ANTLR's Java generator to be able to use
>>named unicode character catagories as "pre-defined" "protected"
>> lexer rules, but supporting anything more than the Character class
>> handles is over my head.
>
> Thet would a useful addition - I mean the ability to define
> such "preset" rules in ANTLR. I can do the work for Unicode
> categories once the basic framework is in place.
> ter, is it OK to have ANTLR rely on additional libraries or would I
> have to somehow port the Unicode required functionality into ANTLR
> directly.
I would rather not have my Unicode-parsing application depend on IBM's
library since I would have to distribute it. I think that the
java.lang.Character class's support is sufficient.
Presumably, the modified ANTLR would then generate code like this:
int type = Character.getType(LA(1));
switch (type) {
case Character.END_PUNCTUATION:
mRULE(true);
theRetToken=_returnToken;
break;
....
}
What do you think?
- Brian
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list