[antlr-interest] Updated C++ parser

Ric Klaren klaren at cs.utwente.nl
Thu Jun 10 09:26:08 PDT 2004


Hi,

Once upon a time I was planning to give a talk on Monty's tokenstream
filtering idea. Things went differently a bit but I ended up with a
modified version of the C++ parser David Wigg has been working on.

Let me cut and paste from what I added to the MyReadme.txt in the tar ball:

I Grabbed the C++ grammar as a vehicle to play with tokenstream filtering.
The plan was to attempt to make a drop in C++ preprocessor together with
#include #ifdef support. This to demo tokenstream filtering for a talk.
Just before the deadline I came to the point that I had to redesign things
to keep things nice and concise for the talk. Therefore I had to scratch
using this as a talk vehicle. Yet the added features more or less worked
and can be interesting to look at.

Added features:

- Split up the big monolithic .g file from the original into a separate lexer
  and parser.
- Added a Makefile (GNU Make/Gcc)
- Use of a Custom Token class with line and file information. This needs a
  patch on antlr. It is supplied as antlr.patch in this directory. It's
  probably present in the next snapshot after 2.7.4 release. (Once Terence
  merges the doc changes from 2.7.4 to mainline ;) but that can wait while
  he's engrossed in hacking on antlr 3 :) )
- I used a custom stream multiplexer class that handles closing of streams if
  needed. CPPStreamStack. Antlr's default thing does not handle cleanups
  (read direct java port).
- Added a ugly ancient hash table template to hash the filenames encountered.
  The tokens only store a hash value for filename. It's loosely based on
  a Modula 2 implementation of hash tables in GMD's cocktail (uses the same
  hash function).
- Handling of #line directives with the above (might be off one line here
  and there I did not check that for correctness, proof of concept it works)
- Handling of #include directives with a TokenStream multiplexer.
  this is completely done inside the lexer. To make this nice (conceptually)
  the preprocessor should become the TS multiplexer.
- Handling of simple #define/#ifdef/#ifndef/#else/#endif statements. Nested
  but basically only checking defined or not defined. (good enough for
  #include guards) This works actually quite nice. Should be easy enough to
  implement more functionality.  

As long as the preprocessor stays LL(1) it should be possible to fold the
tokenstream multiplexor and #include handling into the preprocessor, giving
a pretty nice layered design. Although sinning against the no feedback from
parser to lexer mantra.

Releasing this stuff now since I probably won't have much time to make it a
real release. So tinker with it at your own risk. It needs a cleanup. It
contains old code from David's #line handling which I did not prune and
probably there's some more virtual corpses in various virtual closets..

It could be an idea to pass newlines on to the preprocessor. This would
require some more tinkering. It might become necessary to let the
preprocessor patch the line numbers from outgoing tokens.

The result of the few days hacking can be found here:

http://wwwhome.cs.utwente.nl/~klaren/antlr/CPPParser.tar.bz2

This is basically released for the sake of releasing it. It's not a
finished product, more a proof of concept. I probably don't have time to
wrap it up into something presentable soon, so that's why I release it.

Cheers,

Ric
--
-----+++++*****************************************************+++++++++-------
    ---- Ric Klaren ----- j.klaren at utwente.nl ----- +31 53 4893755  ----
-----+++++*****************************************************+++++++++-------
  Wo das Chaos auf die Ordnung trifft, gewinnt meist das Chaos, weil es
  besser organisiert ist. --- Friedrich Nietzsche



 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/
 



More information about the antlr-interest mailing list