[antlr-interest] SeeGramWrap: Yet another refactoring

Terence Parr parrt at cs.usfca.edu
Wed Mar 3 17:54:42 PST 2004


Wow!  Great work, Ed!  Should we make the text of this an "article" at 
antlr.org?

Terence

On Mar 3, 2004, at 5:19 PM, edcjones wrote:

> I have placed a re-refactored version of "SeeGramWrap-03.02.2004.tgz"
> on my webpage at "http://members.tripod.com/~edcjones/pycode.html".
> SeeGramWrap parses a piece of C code and the resulting parse tree is
> output in man and machine readable form. The result can be used for
> program transformations. Since a particular trnsformation algorithm
> may not require all the information present in the tree, the user can
> select what to output.
>
> This program has been written and tested only under linux.
>
> Thanks to John Mitchell and Monty Zukowski for "cgram.tgz". Every
> parser generator need to have a good C grammar. Also thanks to
> Terrence Parr for ANTLR (http://www.antlr.org/).
>
> ==============================================================
>                           CONTEXT
>
> Python (http://www.python.org) is a scripting language that is both
> easy to read and easy to write. It is so easy to read that I can
> usually read my own code six months after I write it. But Python is
> slow. If speed is needed for part of a project, Pythonistas write C
> code that call functions in Python's large API. It also common to wrap
> large C libraries so they can be called by Python. The wrapping code
> is repetitive and there may be a lot of it so methods have been
> developed for automated wrapping.
>
> The best-known approach is SWIG (http://www.swig.org/). For complex
> wrappings, SWIG requires the writing of "typemaps", an unintuitive
> process where pieces of C code you write are spliced into the wrapper
> code generated by SWIG.
>
> Another wrapper related approach is Pyrex which is found at
>
>     http://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/
>
> Pyrex has its own repetitive boilerplate that has to be written. But
> the Pyrex boilerplate is so straightforward that it can be taught
> algorithmically. See "Michael's Quick Guide to Pyrex" at
> "http://ldots.org/pyrex-guide/".
>
> I think that the Pyrex boilerplate is _so_ straightforward that it can
> be machine generated. Therefore I have been sporatically developing
> software to do this. A thoroughly buggy version of this is on my web
> page, "http://members.tripod.com/~edcjones". It is called
> "cgram.tar.gz" (The name will be changed). Look at it but don't use
> it. "SeeGramWrap" is a major revision of the front end of 
> "cgram.tr.gz".
>
> I think the automatic-wrapper program can be made to work. It might be
> easier to use than SWIG. It is still a lot of work to prepare complex
> C header files. What we have is really a "program transformation" or
> "tree transformation" problem.
>
> I think some of the issues are:
>
> 1. Since parser generators have a long and steep learning curve, I
> prefer to use them as black boxes which generate parsers which output
> results that I can analyze using Python. The parser created by a
> parser generator should output trees in two formats: one easy to look
> at and another that a program can easily read. For examples, see below.
>
> 2. I find trees very easy to work with. I want the trees to be front
> and center and highly visible. I prefer to "manipulate a tree" rather
> than "fire a rule".
>
> 3. The most common type of C macro has a type as one of its arguments:
>
>     #define CAST(x, type) (type *) x
>
> How can these be automatically wrapped for Python which is a
> dynamically typed language?
>
> =============================================================
>                     TECHNICAL OVERVIEW
>
> I use some C grammars associated with ANTLR. The grammar package is
> called "cgram". See "http://www.antlr.org/resources.html".
>
> In "cgram" there is a java program "TestThrough.java" which parses C
> code into an AST then runs a tree grammar on the AST and outputs the
> original code. The tree grammar is named "GnuCEmitter.g". I work with
> this grammar because the terminal tokens are printed in the correct
> order. I modified the grammar turning it into a template. A piece of
> the original "GnuCEmitter.g" is:
>
> ----
> typeQualifier
>         :       a:"const"                       { print( a ); }
>         |       b:"volatile"                    { print( b ); }
>         ;
> ----
>
> The modified version is:
>
> ----
> typeQualifier
>         :       a:"const"                       { <@ a @> }
>         |       b:"volatile"                    { <@ b @> }
>         ;
> ----
>
> In this template, strings of the form "<@ ... @>" will each be
> replaced by a set of print statements. Moreover the entire rule will
> be wrapped by prints.  The template is used in
> "emitter/insert_prints.py". If "insert_prints.py" is run the result is:
>
> ----
> typeQualifier
>   { if ( inputState.guessing==0 ) {
>           print(Open);
>           print("typeQualifier");
>        }
>     }
>         :  (
>                 a:"const"        {  print(Open);
> print("typeQualifier.0"); print( a ); print(Close); }
>         |       b:"volatile"     {  print(Open);
> print("typeQualifier.1"); print( b ); print(Close); }
>            )
>   { currentOutput.print(Close + MyTokenSep); }
>         ;
> ----
>
> If the original C program , "temp2.c", is
>
>     char* s = "ab";
>
> The output of the modified emitter grammar is "temp2.c.data":
>
> ----
>     <<OPEN>>                 <<OPEN>>                 <<OPEN>>
>     externalList             declarator               expr
>     <<OPEN>>                 <<OPEN>>                 <<OPEN>>
>     externalDef              pointerGroup             primaryExpr
>     <<OPEN>>                 <<OPEN>>                 <<OPEN>>
>     declaration              pointerGroup.0           stringConst
>     <<OPEN>>                 *                        <<OPEN>>
>     declSpecifiers           <<CLOSE>>                stringConst.0
>     <<OPEN>>                 <<CLOSE>>                "ab"
>     typeSpecifier            <<OPEN>>                 <<CLOSE>>
>     <<OPEN>>                 declarator.0             <<CLOSE>>
>     typeSpecifier.1          s                        <<CLOSE>>
>     char                     <<CLOSE>>                <<CLOSE>>
>     <<CLOSE>>                <<CLOSE>>                <<CLOSE>>
>     <<CLOSE>>                <<OPEN>>                 <<CLOSE>>
>     <<CLOSE>>                initDecl.0               <<CLOSE>>
>     <<OPEN>>                 =                        ;
>     initDeclList             <<CLOSE>>                <<CLOSE>>
>     <<OPEN>>                 <<OPEN>>                 <<CLOSE>>
>     initDecl                 initializer              <<CLOSE>>
> ----
>
> This output can be processed by "tree.py" to produce "temp2.c.nest"
>
> ----
> (externalList,
>   (externalDef,
>     (declaration,
>       (declSpecifiers,
>         (typeSpecifier,
>           (typeSpecifier.1, |char|))),
>       (initDeclList,
>         (initDecl,
>           (declarator,
>             (pointerGroup,
>               (pointerGroup.0, |*|)),
>             (declarator.0, |s|)),
>           (initDecl.0, |=|),
>           (initializer,
>             (expr,
>               (primaryExpr,
>                 (stringConst,
>                   (stringConst.0, |"ab"|))))))), |;|)))
> ----
>
> or "temp2.c.src":
>
>     char * s = "ab" ;
>
> If "temp2.c.src" is put through the entire process itself we get
> "temp2.c.src.src" which is identical to "temp2.c.src". This test is
> done by "docheck.py".
>
> In the ".data" or ".nest" files the tokens from the original C code
> are in the correct order. It is easy to recover
>
>     ('char', '*', 's', '=', '"ab"', ';')
>
> Thanks,
> Ed Jones
>
>
>
>
>
>
> Yahoo! Groups Links
>
>
>
>
>
>
--
Professor Comp. Sci., University of San Francisco
Creator, ANTLR Parser Generator, http://www.antlr.org
Cofounder, http://www.jguru.com
Cofounder, http://www.knowspam.net enjoy email again!
Cofounder, http://www.peerscope.com pure link sharing





 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/
 



More information about the antlr-interest mailing list