[antlr-interest] C runtime beta 1 for ANTLR 3.1 (includes C++ compatibility)

Jim Idle jimi at temporal-wave.com
Mon Feb 25 17:13:43 PST 2008


For those of you waiting for the C runtime to be updated for 3.1, I have
good news. Also good news for those wanting to use C++ with ANTLR 3.

 

Hopefully soon, but he is busy I know, Ter will make available the
following on the download area. You will need all of these components to
be successful (with ANTLR 3.1 C Runtime, with women, men, money,
whatever, you need something else):

 

·         New API documentation (this is still getting updated with
release change notes and more how-to stuff), which is already on the web
site in the usual place. From the home page see the Documentation
section and take the API Documentation link, then take the C link. I
usally get lots of typo corrections for comment text, and nothing at all
for the docs, so perhaps nobody reads these ;-)

·         New release distribution in: libantlr3c.3.0b1.tar.gz – you
will need this to build the new runtime, but the new examples are now
self contained if you are a Windows developer.

·         New examples tar, which I hope will be available in the same
place as these beta snapshots.

 

I have begun the “What’s changed in 3.1” docs, but there is so much that
they are not done yet. However the example projects are pretty well
documented and should show you how to go on. So, please use a
combination for now. If you are not comfortable with this, then please
wait for the official release, where the documentation will be much
improved even over the current improvements.

 

Things I know are not yet working:

 

1)      The debugger templates are not up to date, so debugging remotely
with ANTLRWorks will not work yet. The C library code has been
implemented though, which means the runtime is now dependant on socket
libraries (which configure should pick up)

2)      Tree parsers that rewrite trees leak some memory for the moment,
but this will be fixed of course. For now, do not try to free the tree
node streams (see the example polydiff, which just ignores freeing the
streams, because it will core dump if you do).

3)      Rewrites of imaginary tokens with both token and text may not
work because the Java templates rely on being able to overload method
calls and I can’t do that in C. It just needs a little more
investigation.

 

However, the following things have happened:

 

1)      Performance improvements for backtracking parsers;



2)      Imported grammars now work so large grammars can be split up
into smaller pieces and produce separate translation units. This works
as per the Java version – see example project composite-java;



3)      Tree rewriting from tree parsers now works a la Java (but the
caveat 3) above;



4)      The runtime headers and generated code are now C++ compatible.
The C parser example (in the C runtime examples), now detects if it is
being compiled as C++ and integrates a dinky little C++ class within the
action code. You still need to use the C stuff for accessing token
values and so on, but I will shortly implement some C++ convenience
wrappers for those things. This is not really a big deal to my mind, as
the main thing is that you can now compiler the generated C code as C++
(need to tell the compiler to do this, as the suffix is still .c), and
can make calls directly to your C++ code, instantiate C++ classes, and
so on and so forth. I have not performed lots of testing on this, so I
appreciate any bug reports with this. Examples works with VS2005 C++ and
g++ with all warnings turned on, so I think it is pretty sound. (NB: Do
not compile the runtime library as C++, there is no reason).

Later I may well produce a complete C++ implementation from scratch,
however, at this point I am not sure that it buys you anything. Please
let me know if there are things you cannot do with the system as it
stands (other than access the tokens and so on using C++ objects, which
will be done later). 



5)      Quite a few bug fixes (will be documented later), but I am sure
that I lost a few bug reports and the change to the runtime for 3.1 were
quite substantial, so may need testing and re-reporting. I had no
feedback from anyone who previously reported C runtime bugs – without
feedback, I will have to assume there are no bugs ;-)



6)      Integration of the UTF converter functions from Unicode.org.
Allows you to convert between the various encodings (UCS2, UTF-8,
UTF-16, UTF32,). Will be used shortly to create a UTF32 input stream
into which UTF8 and UTF16 input can be converted.



7)      The constructors for everything now just return NULL as the
pointer if they fail and do not try to return a reason. This is because
other than specifying a bad input file name, the only reason for failure
now is being out of memory. This cleans up status checking quite a bit.



8)      All the internal collections stuff, such as Vectors and Lists
are now all rebased to be 0 index based rather than 1 based. There was
originally a reason that I did it like that (Something to do with it
being easier to track the Java stuff), but I have completely forgotten
what it was, so I just recoded everything to be 0 based. If you have
used these structures in your own code, you will need to make similar
trivial adjustments. See the dynamic scope example, which has changed
accordingly to accommodate this.



9)      The overuse of 64 bit data types has been corrected and
rationalized. 64 bit elements are now only used where they make any
sense at all, such as input stream pointers and marks and as input keys
for the Int Trie collection. I later beta will also rationalize the use
of UNIT vs INT, as this got a bit out of hand too, and is now simpler,
but not quite checked out in all places.



10)   MACROS have been updated and added to the generated code (see the
generated code for now, though there is some new MACRO documentation in
the updated API docs). There is also a new macro to reference the token
source in a lexer, which wil make things easier for some. See the
examples again for information.



11)   Many more things I cannot remember at this second and will need to
consult the back of my Marlboro Lights packets and beer napkins for.
More to come before release of course.

 

Please give this a go if you can afford the time. The lexer parser tree
parser stuff is all solid as far as I can see, but the new tre rewriting
stuff could do with some testing and I would like feedback from the C++
guys. Thanks in advance for your effort in helping me to make this
release solid – the quicker that bugs are found and fixed, the quicker
the release wil be ready as the C# stuff is pretty much there I think.

 

Cheers,

 

Jim

 

PS: Sorry in advance if I have left anything out from this, I probably
have, and so please only try this if you have a bit of spare time and
won’t email me to say how useless I am for making you spend an hour on
something ;-) [yes, people really do do that!]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080225/c502de51/attachment-0001.html 


More information about the antlr-interest mailing list