[antlr-interest] SeeGramWrap (2nd try)

mzukowski at yci.com mzukowski at yci.com
Wed Feb 18 15:29:18 PST 2004


Very nice!  Thanks for doing this and posting about it.  Was there any
reason for doing it?  Working on an editor, perhaps?

Monty

-----Original Message-----
From: edcjones [mailto:edcjones at yahoo.com] 
Sent: Saturday, February 14, 2004 7:35 AM
To: antlr-interest at yahoogroups.com
Subject: [antlr-interest] SeeGramWrap (2nd try)

I have placed "SeeGramWrap-02.12.2004.tgz" on my webpage at
"http://members.tripod.com/~edcjones/pycode.html". SeeGramWrap parses
a piece of C code and the resulting parse tree is output in man and
machine readable form. The result can be used for program
transformations. Since a particular trnsformation algorithm may not
require all the information present in the tree, the user can select
what to output.

I use the C grammar "cgram.tgz" associated with ANTLR. See
"http://www.antlr.org/resources.html".

In "cgram" is a java program "TestThrough.java" which parses C code
into an AST then runs a tree grammar on the AST and outputs the
original code. The tree grammar is named "GnuCEmitter.g". I work with
this grammar because the terminal tokens are printed in the correct
order. I modified the grammar turning it into a template. A piece of
the original "GnuCEmitter.g" is:

----
structOrUnionBody
: ( (ID LCURLY) => i1:ID lc1:LCURLY { print( i1 );
print ( "{" ); tabs++; }
( structDeclarationList )?
rc1:RCURLY { tabs--; print(
rc1 ); }
| lc2:LCURLY { print( lc2 );
tabs++; }
( structDeclarationList )?
rc2:RCURLY { tabs--; print(
rc2 ); }
| i2:ID { print( i2 ); }
)
;
----

The modified version is:

----
@structOrUnionBody
# : ( (ID LCURLY) => i1:ID lc1:LCURLY { <@ i1 @> print(
"{" ); tabs++; }
( structDeclarationList )?
rc1:RCURLY { tabs--; <@ rc1 @> }
# | lc2:LCURLY { <@ "{" @> tabs++; }
( structDeclarationList )?
rc2:RCURLY { tabs--; <@ rc2 @> }
| i2:ID { <@ i2 @> }
)
;
----

In this template, strings of the form "<@ ... @>" will each be
replaced by a set of print statements. Sometimes the rule may also be
wrapped by prints. The template is used in
"mystuff/emitter/insert_prints.py". If certain options are set, a "@"
or "#" at the beginning of line has a special meaning. If an "@" is in
front of the rulename, information is printed when the rule is entered
or exited. If a "#" is NOT at the beginning of a line, then only C
tokens are output, just as in the original "GnuCEmitter.g". If a "#"
is present, more information is output. Detailed documentation of this
process is in "mystuff/emitter/insert_prints.py". if "insert_prints.py
SOME SOME FULL" is run the result is:

----
structOrUnionBody
{ System.out.print( "(" ); print( "structOrUnionBody" ); }
: (
( (ID LCURLY) => i1:ID lc1:LCURLY {
currentOutput.print("(\"structOrUnionBody.0\", "); print( i1 );
currentOutput.print("), "); print( "{" ); tabs++; }
( structDeclarationList )?
rc1:RCURLY { tabs--; print(
rc1 ); }
| lc2:LCURLY {
currentOutput.print("(\"structOrUnionBody.2\", "); print( "{" );
currentOutput.print("), "); tabs++; }
( structDeclarationList )?
rc2:RCURLY { tabs--; print(
rc2 ); }
| i2:ID { print( i2 ); }
)
)
{ System.out.print( "), " ); }
;
----

If the original C program , "silly.c", is

----
int i;
----

The output of the modified emitter grammar is "silly.c.data":

----
<<OPEN>>
externalList
<<OPEN>>
externalDef
<<OPEN>>
declaration
<<OPEN>>
declSpecifiers
<<OPEN>>
typeSpecifier
<<OPEN>>
typeSpecifier.3
int
<<CLOSE>>
<<CLOSE>>
<<CLOSE>>
<<OPEN>>
initDeclList
<<OPEN>>
initDecl
<<OPEN>>
declarator
<<OPEN>>
declarator.0
i
<<CLOSE>>
<<CLOSE>>
<<CLOSE>>
<<CLOSE>>
;
<<CLOSE>>
<<CLOSE>>
<<CLOSE>>
----

This output can be processed by "printtree.py" to produce
"silly.c.nest":

----
['externalList',
['externalDef',
['declaration',
['declSpecifiers',
['typeSpecifier',
['typeSpecifier.3',
'int']]],
['initDeclList',
['initDecl',
['declarator',
['declarator.0',
'i']]]],
';']]]
----

and "silly.c.src":

----
int i ;
----

If "silly.c.src" is put through the entire process itself we get
"silly.c.src.src" which is identical to "silly.c.src".

In the ".data" or ".nest" files the tokens from the original C code
are in the correct order. It is easy to recover ("int", "i", ";").



 
Yahoo! Groups Links



 


 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/
 



More information about the antlr-interest mailing list