[antlr-interest] Time for another question about Unicode support

David Ewing dewing at apple.com
Tue Oct 23 22:29:17 PDT 2001


I've been using ANTLR for a while now, and I need to get it to handle
Unicode input. We use ANTLR to parse Java source code for indexing
information in Project Builder, Apple's IDE for Mac OS X.

It's not obvious to me how much work has gone on in this area for 2.7.2.
Scanning the list archives, it looks like some work has been done to support
it in Java parsers, but not for C++ parsers. Of course, we're generating a
C++ parser. (Yes, we use a C++ parser, called from Objective C, to parse
Java code!)

So, in my search for what to do along these lines, I ran into ICU
(International Components for Unicode), an open source library from IBM
<http://oss.software.ibm.com/icu>. Older versions of it are the basis of the
i18n classes in the JDK. There are both Java and C++ versions. It seems to
contain appropriate character set classes, which might solve that issue on
the C++ side. So, has using ICU been considered for ANTLR?

I may be able to help out in this effort, though for me that would mean
starting work on it soon. My guess is that my time pressures will mean
writing a custom lexer to deal with Unicode. Something that would return IDs
with UTF-8 strings. But I'd rather not do it that way. I'd rather help out
adding the support "the right way".

Anyhow, any info or recommendations would be greatly appreciated.

Dave
-- 
David Ewing, Mac OS X Development Apps, Apple Computer
--



 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list