[antlr-interest] How do I preserve comments in a language to language translator

Howard Nasgaard nasgaard at ca.ibm.com
Wed Aug 11 14:03:33 PDT 2010


Jim,  It sounds like you understand what I need to do.  I will be happy 
with a 'best guess' approach.  What you describe is basically what I 
understood from the reading I've done.  What I think I am still missing is 
an example of the mechanics of doing this in the tree grammar.  Is it a 
matter of inserting code in each rule to examine each token, looking for 
comment nodes (assume a unique channel for those).  Would you track the 
index of the last node checked so that you can get the range of tokens to 
examine?  It sounds like this could get a bit messy. 

Howard W. Nasgaard



antlr-interest-bounces at antlr.org wrote on 11/08/2010 04:46:12 PM:

> [image removed] 
> 
> Re: [antlr-interest] How do I preserve comments in a language to 
> language translator
> 
> Jim Idle 
> 
> to:
> 
> antlr-interest
> 
> 11/08/2010 04:48 PM
> 
> Sent by:
> 
> antlr-interest-bounces at antlr.org
> 
> This is a very tricky thing to perfectly, but not so difficult to do as 
a
> 'best guess' type of algorithm. For instance if the comments are found
> before certain tokens and can be just pushed to the output before the
> translated version (like doxygen comments or javadoc etc), or if 
'comments
> close by' is a reasonable guess. It is difficult to speak to you problem
> generically, but some translations make this easy enough and some very
> difficult.
> 
> However, what you will need to do is locate the token that 'starts' your
> construct output, then find its equivalent token position in the 
original
> tokenized input stream. If the token in the tree is from the original 
input
> stream then it is easy, otherwise you can use the user1, user2, user3 
fields
> of a token to record the token that 'starts' the code you have 
translated or
> perhaps the start and end tokens that are the comment block. 
> 
> Now, knowing the input token position, you can traverse backwards in the
> token stream (use get and not LT as LT skips off channel tokens) and 
find
> the first of the comment tokens that precedes it (by checking the 
token's
> channel). This will be easier if you set the comments to a particular
> channel and not just HIDDEN (which is channel 99). When you know the 
token
> position of the comment token, then you can traverse forwards and copy 
the
> token text to the output (changing the comment lead-in characters should 
you
> need to) using the pointers available in the token (which point to the
> original text). 
> 
> So, you just need to get familiar with asking the tree nodes for their
> tokens and then asking the tokens what index they are and using the get
> methods to access the tokens in the input stream.
> 
> So:
> 
> // A comment
> // Another
> // yet another
> int Cfunc( ....
> 
> So, if the comments are going on channel 2 then you will have:
> 
> 0 COMMENT 
> 1 COMMENT 
> 2 COMMENT 
> 3 ID 
> 4 ID 
> 5 LPAREN 
> 
> Now, your first parser is probably going to generate ^(FUNCDECL ID ID 
.....)
> 
> You can now attach the index of the first comment (0) to user1 and then
> index of the last comment to user2 of say FUNCDECL, or the first ID.
> Assuming that the token is preserved through all the rewrites, then this
> information will propagate to your final AST.
> 
> Of course this is just illustrating what you need to do generally as I 
do
> not know exactly what you are trying to do.
> 
> Jim
> 
> > -----Original Message-----
> > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> > bounces at antlr.org] On Behalf Of Howard Nasgaard
> > Sent: Wednesday, August 11, 2010 1:13 PM
> > To: antlr-interest at antlr.org
> > Subject: [antlr-interest] How do I preserve comments in a language to
> > language translator
> > 
> > I am writing a translator that will convert from one version of a 
language
> to a
> > newer version of that language.  The versions are syntactically 
similar so
> their
> > underlying ASTs are similar.  I am using parsers for the grammar and 
tree
> > grammars generated as C++.  The old language is parsed and an AST is
> built.
> > Then numerous walks of the AST are done using generated tree grammars.
> > One of the walks creates a new AST, the translation, which conforms to 
the
> > tree hierarchy that describes the new language elements.  A final walk 
of
> the
> > new AST "pretty prints" the translation.
> > 
> > As part of the translation walk, or whatever works, I would like to 
copy
> as
> > many of the comment tokens across to the new AST as possible.  Based 
on
> > my reading, the comments are there as they are being directed to the
> > HIDDEN channel.  It is just not clear how, in my tree grammar, I would
> access
> > them.  I have been unable to find any descriptions  of how to do this 
that
> > apply to antlr3 and C++.
> > 
> > Howard W. Nasgaard
> > 
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> > email-address
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/
> your-email-address


More information about the antlr-interest mailing list