[antlr-interest] Preserving ALL comments!

Andy Tripp antlr at jazillian.com
Wed Feb 22 06:26:34 PST 2006


Damir Kirasić wrote:

>
> We agree that it is easy to remove all Newline, Whitespace and Comment 
> from the token stream.
> Our problem is that we don't know is how to "programmatically" 
> determine which comment goes with which code.
> So far, our main objective was to have comments attached as hidden 
> tokens to the corresponding nodes in the AST. And at the same time we 
> would NOT like to change the grammar file.
> For example if we have:
>>     main()   /* comment2 */
>> comment2 has to be "reassigned" not to BLANK, not to RPAREN but to ID 
> because, according to AST construction from grammar, neither BLANK nor 
> RPAREN will be present in the AST.  So, it seems that we have to know 
> (from inspecting grammar and AST construction) that RPAREN will not be 
> in the AST and skip it as we already skipped the BLANK token.
> As far as we can see it, if a comment goes with a token that will not 
> be present in the AST,
> we have to go back and reassign given comment to next token (which 
> will be present in the AST). And yet, we don't know if that new 
> candidate token will be present in the AST.
>
> Is it possible? Are we asking too much?
> Should we reformulate our objective? (To preserve comments as HIDDEN 
> tokens attached to "normal" AST nodes).
>
> Thank you for your answer(s).
>
> Damir
>
Yes, it is possible. This is exactly the problem that I had to solve.
See /"Preserving the Documentary Structure of Source Code in 
Language-based transformation Tools"/ by Michael L. Van De Vanter at Sun 
Laboratories, which talks about the same issue.

What I do is just before stripping out the comment/newline/whitespace 
tokens, I give each physical line of input a
"loose description" (e.g. "declaration of variable i", "a for 
statement", "a comment", etc). Then, later, after translation is
done, I attempt to put each comment back with the line that it seemed to 
"got with" at the start. I can send you more
details from my top-secret-highly-classified design document in email if 
you'd like :)

Andy



More information about the antlr-interest mailing list