[antlr-interest] Re: Unicode support

Mark Lentczner markl at glyphic.com
Wed May 19 08:35:35 PDT 2004


> I have to generate in C++ and I will need it to parse strings with
> asian languages. So I guess I need some pretty efficiant unicode
> support.
I'm not clear here.  Is your requirement that you have to parse 
constructs like:
	"<asian characters here>"
Where the only non-US-ASCII characters appear between quotes?  And that 
the only restriction between those quotes is that it is a sequence of 
vaild Unicode characters?  If so, this is easily doable in with Antlr 
in C++, if you take the treat your input as UTF-8.

If you need to support identifiers composed of non-US-ASCII characters, 
it is a bit more difficult, but still doable.

This is exactly what I'm doing: My language is defined over the full 
Unicode character set, allows Unicode in string literals, comments, 
identifier names, and in a few cases operators (such as the U+F7, the 
division sign).  I lex and parse the language with Antlr, generating a 
C++ lexer that accepts a UTF-8 encoded Unicode stream of bytes.

I'd be happy to share my work on this.

> Hope 3.0 will be out before end of summer because that's my dead line.
I think the time frame is longer than that.

	- Mark



 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/
 



More information about the antlr-interest mailing list