[antlr-interest] C++ and Unicode

Ruslan Zasukhin sunshine at public.kherson.ua
Mon Aug 16 02:36:16 PDT 2004


On 8/16/04 12:09 PM, "Ric Klaren" <klaren at cs.utwente.nl> wrote:

> Hi,
> 
> On Sat, Aug 14, 2004 at 02:26:52PM +0200, Martin Probst wrote:
>> I'm about to start writing a parser which has to support Unicode. I want
>> to use ANTLR for this task and the output has to be Unicode. I was quite
>> surprised to see that currently Unicode with C++ and ANTLR is not really
>> supported. The only thing I found about it is the patch or special
>> distribution (?) by Ric Klaren.
>> 
>> Can somebody point me to information about the current status of C++
>> Unicode support in ANTLR? Is the Unicode version on this page:
>> <http://wwwhome.cs.utwente.nl/~klaren/antlr/right.html> in a usable
>> state or more some kind of development?
> 
> The patch on my page is a hack there's a better approach detailed a while
> ago by Mark Lentczner.
> 
> See this thread:
> 
> http://groups.yahoo.com/group/antlr-interest/messages/11772
> 
> I also got another 'hack' that makes the C++ part read UTF8 and store it in
> the backend in std::string but UTF8 encoded. The framework for that can be
> adapted quite easily to deal with other input encodings and output
> encodings.
> 
> I'm waiting for some patches Mark promised me a while back. After that I'll
> release a new snapshot with 2.7.4 bugfixes, the C++ port of the
> TokenStreamRewriteEngine, UnicodeCharBuffer and UnicodeCharScanner. Also a
> new reference counter will be used for tokens (for starters). Character
> literals are limited to \ufffe though due to the ANTLR 2 analysis engine.

Hi Rick,

Yes we also think that UTF8 should be the first step to unicode world.
And it looks to be relatively easy step.

For example for our SQL grammar, we have:

A) keywords -- always English

B) identifiers -- we want/can extract them as UTF8 strings, which later we
will self convert to UTF16. Identifiers this is e.g. Name of table or field.

C) string constants --

    fld = 'affjsdfhjkfhjksdhf '

It also can be extracted in UTF8, and later we will convert it to UF16.

We need convert to UTFq6, because we use IBV ICU library, so all our
internal algorithms work in UTF16.

Yes, Please make UTF8 support. At least partial excluding keywords.


-- 
Best regards,
Ruslan Zasukhin      [ I feel the need...the need for speed ]
-------------------------------------------------------------
e-mail: ruslan at paradigmasoft.com
web: http://www.paradigmasoft.com

To subscribe to the Valentina mail list go to:
http://lists.macserve.net/mailman/listinfo/valentina
-------------------------------------------------------------



 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/
 



More information about the antlr-interest mailing list