[antlr-interest] C++ code target

Fri Apr 6 15:10:58 PDT 2007

Jim:

I don't really _need_ a C++ interface, I was just hoping to convince Ric not
to make the same mistake again with Antlr 3.  I can understand not going
back and adding Unicode support to Antlr 2.x, but IMO there is no good
reason for not building in Unicode support right from the start in any new
implementation.  

This is 2007 after all.  Unicode isn't something new, and we're long past
the point where everyone using a computer uses an 8-bit character set.  It
just doesn't make sense for a parsing tool like Antlr to be restricted to
parsing 8-bit characters.  I have a feeling most of the people around here
aren't Windows programmers, but surely the Mac and Linux are Unicode by now,
aren't they?

In any case, sounds like you've got it covered in the C implementation.
Unfortunately, it's too late to consider using Antlr 3 for our product as we
are going to ship by July 30 and it would be suicide to rewrite all our
parsers right now (there are 5 of them).  I'll have a look at the C
implementation for our next release though.

--
Don

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Jim Idle
> Sent: Tuesday, April 03, 2007 2:49 PM
> To: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] C++ code target
> 
> What the C implementation does is deal with everything internally a
> UTF-32, you can then supply an input stream that provides each
> character
> as a 32 bit value, regardless of the input encoding (which the input
> stream is responsible for dealing with). Because all the library code
> then deal with 32 bit characters regardless of the input stream,
> there
> is no need for anything to know about the size of th incoming
> characters
> except the input stream itself, which may need to know how to rest to
> a
> specific character offset etc. The advantage is that there is little
> if
> any overhead. The token stream holds offsets that the input stream
> knows
> how to convert to 'strings' if they are referenced. There is
> currently
> support for latin-1 and UTF-16 (UCS2 I suppose) input streams and
> string
> manipulations for both (which will probably be easier to handle in
> C++ I
> suspect ;-).
> 
> If you really need a C++ interface and cannot wait for Ric's
> implementation, then you could use the C output and create a wrapper
> class for it? I was thinking of adding this to the output for C
> anyway
> in fact so that you could include the header and it would be a class
> definition if asked for.
> 
> Ter - perhaps we can consider that ability for a target to define
> multiple output files (call lots of templates like headerfile() with
> the
> same input as headerfile/outputfile ?). This would make it a bit
> neater
> to generate a COM interface for instance - however it can all be done
> in
> the same header file in the end of course, with # define.
> 
> Jim
>