[antlr-interest] Preserving ALL comments!
Andy Tripp
antlr at jazillian.com
Fri Feb 24 07:11:02 PST 2006
Well, it turns out I guess I haven't written much concrete about how to
preserve whitespace.
I'd like to write a paper with more details in the next few weeks. I'm
actually working on this
stuff now, too. For a good chuckle, take a look at this:
/* a multiline
* comment
*/
Imagine that's in C, indented as shown, as your input. And now, it needs
to be indented
in your Java output because it's inside a class:
class whatever {
/* a multiline
* comment
*/
}
...so you actually have to add indenting to the comment token's text
itself. So you have to
say "after every newline in a multiline comment, add indentation based
on the current indentation
level".
Below is what I did write (Dick Cheney gave me permission to declassify
it :) , but as you
can see, it give no real specifics, other than describe the general
placement technique
Andy
-----------------------------------------
The CommentSaver <cid:part1.09050001.09010202 at jazillian.com> class is
responsible for saving comment information. The
|CommentSaver.removeWhitespace()| method, in addition to actually
removing whitespace, splits each file up into separate lines. The
Source.addFile() method is called to read in each C file, and as it's
removing whitespace, it also builds up a list of line descriptions if
comments are being saved. So the code in |Source.addFile()| looks like
this:
List<List<Token>> linesInFile =
CommentSaver.getInstance().removeWhitespace(this, javaFileName);
//System.out.println("addFile: fileName=" + fileName + " size="
+ linesInFile.size());
// saveComments == false indicates that the file doesn't have
any comments, because
// we generated it ourselves.
// keepComments == false indicates that the user asked not to
keep the comments.
// Combine the two:
if (Parameters.keepComments && saveComments) {
List<LineDescription> list = lineDescriptions.create(this,
linesInFile);
lineDescriptions.add(this, javaFileName, list);
}
Thus, we keep a single LineDescriptions
<cid:part2.02040508.02070506 at jazillian.com> object around, which
contains a set of "loose descriptions" for each of the lines of the
input C files. The |LineDescriptions.create()| method processes an
entire C file and returns a List of LineDescription objects. The
constructor for the LineDescription
<cid:part3.03080806.01000808 at jazillian.com> class takes a list of tokens
on the line, and creates a "loose description" of the line from that.
Most lines are simply described by their first token. For example, if
the line starts with "if", we set |LineDescription.LineType| to
|LineType.IF|. Sometimes, we have to examine the whole line a little
closer in order to categorize it. For example, suppose we have the line
"int a = 3;". The |LineDescription.lookForDeclaration()| method is smart
enough to look through that line and see that it looks like a
declaration, and then a call to |source.isVariableDeclaration()| tells
us that it's a variable declaration (as opposed to a function declaration).
So the main information that a LineDescription contains is the LineType
<cid:part4.05010509.07040507 at jazillian.com>. One of the types of lines
is |MultiLineComment|, and another is |SingleLineComment|.
Restoring Comments
Even though the storing of comments was not done by a rule, the
restoring of comments is done by rule CommentRestoreRule. Comments are
restored on a file-by-file basis, so CommentRestoreRule
<cid:part5.06000101.02020009 at jazillian.com> extends OncePerFileRule
<cid:part6.05070104.03030302 at jazillian.com> and has an applyToFile()
method that's called for each file. It creates a single LineDescriptions
<cid:part2.02040508.02070506 at jazillian.com> object called "current", and
adds LineDescription <cid:part3.03080806.01000808 at jazillian.com> objects
to it. So we now have loose line descriptions of both the original C
code and the translated Java code. The |CommentRestoreRule.cleanup()|
method, which is called just once, calls |align()| to match up the two
sets of lines, and |addComments()| to add the comments into the Java
code. The "alignment" algorithm basically loops through both sets of
line descriptions together, keeping track of which file and function we
are in. So a comment that comes before the third variable declaration in
function f() in file myfile.c will get placed before the third variable
declaration in function f() in file Myfile.java.
>
>
>
>
>
>
More information about the antlr-interest
mailing list