[antlr-interest] (unknown)

Fri Feb 13 18:01:41 PST 2004

I have placed "SeeGramWrap-02.12.2004.tgz" on my webpage at
"http://members.tripod.com/~edcjones/pycode.html". SeeGramWrap parses
a piece of C code and the resulting parse tree is output in man and
machine readable form. The result can be used for program
transformations. Since a particular trnsformation algorithm may not
require all the information present in the tree, the user can select
what to output.

I use the C grammar "cgram.tgz" associated with ANTLR. See
"http://www.antlr.org/resources.html".

In "cgram" is a java program "TestThrough.java" which parses C code
into an AST then runs a tree grammar on the AST and outputs the
original code. The tree grammar is named "GnuCEmitter.g". I work with
this grammar because the terminal tokens are printed in the correct
order. I modified the grammar turning it into a template. A piece of
the original "GnuCEmitter.g" is:

----
structOrUnionBody
        :       ( (ID LCURLY) => i1:ID lc1:LCURLY   { print( i1 );
print ( "{" ); tabs++; }
                        ( structDeclarationList )?
                        rc1:RCURLY                  { tabs--; print(
rc1 ); }
                |   lc2:LCURLY                      { print( lc2 );
tabs++; }
                    ( structDeclarationList )?
                    rc2:RCURLY                      { tabs--; print(
rc2 ); }
                | i2:ID                     { print( i2 ); }
                )
        ;
----

The modified version is:

----
@structOrUnionBody
#        :       ( (ID LCURLY) => i1:ID lc1:LCURLY   { <@ i1 @> print(
"{" ); tabs++; }
                        ( structDeclarationList )?
                        rc1:RCURLY                  { tabs--; <@ rc1 @> }
#                |   lc2:LCURLY                      { <@ "{" @> tabs++; }
                    ( structDeclarationList )?
                    rc2:RCURLY                      { tabs--; <@ rc2 @> }
                | i2:ID                     { <@ i2 @> }
                )
        ;
----

In this template, strings of the form "<@ ... @>" will each be
replaced by a set of print statements. Sometimes the rule may also be
wrapped by prints.  The template is used in
"mystuff/emitter/insert_prints.py". If certain options are set, a "@"
or "#" at the beginning of line has a special meaning. If an "@" is in
front of the rulename, information is printed when the rule is entered
or exited. If a "#" is NOT at the beginning of a line, then only C
tokens are output, just as in the original "GnuCEmitter.g". If a "#"
is present, more information is output. Detailed documentation of this
process is in "mystuff/emitter/insert_prints.py". if "insert_prints.py
SOME SOME FULL" is run the result is:

----
structOrUnionBody
  { System.out.print( "(" ); print( "structOrUnionBody" ); }
        :  (
                ( (ID LCURLY) => i1:ID lc1:LCURLY   { 
currentOutput.print("(\"structOrUnionBody.0\", "); print( i1 );
currentOutput.print("), ");  print( "{" ); tabs++; }
                        ( structDeclarationList )?
                        rc1:RCURLY                  { tabs--; print(
rc1 ); }
                |   lc2:LCURLY                      { 
currentOutput.print("(\"structOrUnionBody.2\", "); print( "{" );
currentOutput.print("), ");  tabs++; }
                    ( structDeclarationList )?
                    rc2:RCURLY                      { tabs--; print(
rc2 ); }
                | i2:ID                     { print( i2 ); }
                )
           )
  { System.out.print( "), " ); }
        ;
----

If the original C program , "silly.c", is

----
int i;
----

The output of the modified emitter grammar is "silly.c.data":

----
<<OPEN>>
externalList
<<OPEN>>
externalDef
<<OPEN>>
declaration
<<OPEN>>
declSpecifiers
<<OPEN>>
typeSpecifier
<<OPEN>>
typeSpecifier.3
int
<<CLOSE>>
<<CLOSE>>
<<CLOSE>>
<<OPEN>>
initDeclList
<<OPEN>>
initDecl
<<OPEN>>
declarator
<<OPEN>>
declarator.0
i
<<CLOSE>>
<<CLOSE>>
<<CLOSE>>
<<CLOSE>>
;
<<CLOSE>>
<<CLOSE>>
<<CLOSE>>
----

This output can be processed by "printtree.py" to produce "silly.c.nest":

----
['externalList', 
 ['externalDef', 
  ['declaration', 
   ['declSpecifiers', 
    ['typeSpecifier', 
     ['typeSpecifier.3', 
      'int']]], 
   ['initDeclList', 
    ['initDecl', 
     ['declarator', 
      ['declarator.0', 
       'i']]]], 
   ';']]]
----

and "silly.c.src":

----
int i ;
----

If "silly.c.src" is put through the entire process itself we get
"silly.c.src.src" which is identical to "silly.c.src".

In the ".data" or ".nest" files the tokens from the original C code
are in the correct order. It is easy to recover ("int", "i", ";").

Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/