[antlr-interest] Preserving ALL comments!

Andy Tripp antlr at jazillian.com
Fri Feb 24 07:11:02 PST 2006


Well, it turns out I guess I haven't written much concrete about how to 
preserve whitespace.

I'd like to write a paper with more details in the next few weeks. I'm 
actually working on this
stuff now, too. For a good chuckle, take a look at this:
/* a multiline
* comment
*/

Imagine that's in C, indented as shown, as your input. And now, it needs 
to be indented
in your Java output because it's inside a class:
class whatever {
   /* a multiline
    * comment
    */
}

...so you actually have to add indenting to the comment token's text 
itself. So you have to
say "after every newline in a multiline comment, add indentation based 
on the current indentation
level".

Below is what I did write (Dick Cheney gave me permission to declassify 
it :) , but as you
can see, it give no real specifics, other than describe the general 
placement technique
Andy

-----------------------------------------
The CommentSaver <cid:part1.09050001.09010202 at jazillian.com> class is 
responsible for saving comment information. The 
|CommentSaver.removeWhitespace()| method, in addition to actually 
removing whitespace, splits each file up into separate lines. The 
Source.addFile() method is called to read in each C file, and as it's 
removing whitespace, it also builds up a list of line descriptions if 
comments are being saved. So the code in |Source.addFile()| looks like 
this:

        List<List<Token>> linesInFile = 
CommentSaver.getInstance().removeWhitespace(this, javaFileName);
        //System.out.println("addFile: fileName=" + fileName + " size=" 
+ linesInFile.size());
        // saveComments == false indicates that the file doesn't have 
any comments, because
        // we generated it ourselves.
        // keepComments == false indicates that the user asked not to 
keep the comments.
        // Combine the two:
        if (Parameters.keepComments && saveComments) {
            List<LineDescription> list = lineDescriptions.create(this, 
linesInFile);
            lineDescriptions.add(this, javaFileName, list);
        }

Thus, we keep a single LineDescriptions 
<cid:part2.02040508.02070506 at jazillian.com> object around, which 
contains a set of "loose descriptions" for each of the lines of the 
input C files. The |LineDescriptions.create()| method processes an 
entire C file and returns a List of LineDescription objects. The 
constructor for the LineDescription 
<cid:part3.03080806.01000808 at jazillian.com> class takes a list of tokens 
on the line, and creates a "loose description" of the line from that.

Most lines are simply described by their first token. For example, if 
the line starts with "if", we set |LineDescription.LineType| to 
|LineType.IF|. Sometimes, we have to examine the whole line a little 
closer in order to categorize it. For example, suppose we have the line 
"int a = 3;". The |LineDescription.lookForDeclaration()| method is smart 
enough to look through that line and see that it looks like a 
declaration, and then a call to |source.isVariableDeclaration()| tells 
us that it's a variable declaration (as opposed to a function declaration).

So the main information that a LineDescription contains is the LineType 
<cid:part4.05010509.07040507 at jazillian.com>. One of the types of lines 
is |MultiLineComment|, and another is |SingleLineComment|.


     Restoring Comments

Even though the storing of comments was not done by a rule, the 
restoring of comments is done by rule CommentRestoreRule. Comments are 
restored on a file-by-file basis, so CommentRestoreRule 
<cid:part5.06000101.02020009 at jazillian.com> extends OncePerFileRule 
<cid:part6.05070104.03030302 at jazillian.com> and has an applyToFile() 
method that's called for each file. It creates a single LineDescriptions 
<cid:part2.02040508.02070506 at jazillian.com> object called "current", and 
adds LineDescription <cid:part3.03080806.01000808 at jazillian.com> objects 
to it. So we now have loose line descriptions of both the original C 
code and the translated Java code. The |CommentRestoreRule.cleanup()| 
method, which is called just once, calls |align()| to match up the two 
sets of lines, and |addComments()| to add the comments into the Java 
code. The "alignment" algorithm basically loops through both sets of 
line descriptions together, keeping track of which file and function we 
are in. So a comment that comes before the third variable declaration in 
function f() in file myfile.c will get placed before the third variable 
declaration in function f() in file Myfile.java.

>
>

>
>
>  
>



More information about the antlr-interest mailing list