[antlr-interest] Unicode support in both v2.7 and v3

Jim Idle jimi at temporal-wave.com
Mon Jun 4 13:37:14 PDT 2007


There are C input streams for 8 bit (no translation from anything) and
UCS2/UTF-16 (again no translations, it just takes either the 8 bit
number of 16 bit number and assumes that this matches the character
encoding in your grammar. It is of course trivial to add knowledge of a
particular encoding and translate the input stream to the Unicode code
points that are generated for the generated parser. Adding UTF-32 is
trivial, adding UTF-8 is not exactly trivial but not a lot more work.
Basically, the input stream should be able to supply a 32 bit
representation of the Unicode code points, given it has knowledge of the
encoding of its input stream. For an 8 bit encoding, then you just use
the existing 8 bit input stream and install your own function tha tads
translation to the Unicode code point (assuming that the encoding is not
already correct). 16 bit is the same, and so on.

 

Jim

 

From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Micheal J
Sent: Saturday, June 02, 2007 2:34 AM
To: antlr-interest at antlr.org
Subject: Re: [antlr-interest] Unicode support in both v2.7 and v3

 

Hi,

 

Doesn't directly answer your questions but, why not try the C target for
V3?. You could always wrap the C runtime API up in C++ classes anyway.
There was some discussion about UNICODE and the C target a while back on
the list. All you need to do might be to supply an input stream of wchar
characters (I could be wrong on this so search the list archives).

 

Micheal

 

-----------------------
The best way to contact me is via the list/forum. My time is very
limited. 

	-----Original Message-----
	From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of YiQing Yang
	Sent: 02 June 2007 01:15
	To: antlr-interest at antlr.org
	Subject: [antlr-interest] Unicode support in both v2.7 and v3

	 Hi,

	 

	I am trying to use ANTLR to generate a query parser in C++.
Since C++ is not supported yet by v3, I am trying v2.7.6 right now.  I
would like to know how the Unicode is supported in both v2.7 and v3.
Does it support the input stream which is a wchar_t*? Which UTF encoding
formats does it support (UTF8, UTF16, UTF32)? From the Reference Manual
for v2, it seems that Unicode is not fully supported yet and claims that
v3 will have a better Unicode support. Is Unicode fully supported in v3?

	 

	I am new to ANTLR. Sorry if those questions have been answered
before.

	 

	Thanks,

	 

	Yiqing Yang

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20070604/6e88ea1a/attachment.html 


More information about the antlr-interest mailing list