[antlr-interest] Preserving Comments

Jens Boeykens jens.boeykens at gmail.com
Mon Jul 28 01:40:31 PDT 2008


I managed to print out some of the comments with this changed code in
ANTLRv3Tree.g:

alternative
@after {
  CommonTree t = $alternative.start;
  int stop = t.getTokenStopIndex();
  CommonTokenStream tokens = (CommonTokenStream) input.getTokenStream();
  while(stop + 1 < tokens.size() && tokens.get(stop + 1).getLine() ==
tokens.get(t.getTokenStopIndex()).getLine())
        stop++;
  List<Token> comments = tokens.getTokens(t.getTokenStopIndex(), stop,
ANTLRv3Parser.SL_COMMENT);
  for(Token i : comments)
     System.out.println(i.getText());
}
    :   ^(ALT (e+=element)+ EOA) -> alternative_elements(alt={$ALT},
element={$e}, eoa = {$EOA})
    |   ^(ALT EPSILON EOA)  -> alternative_epsilon(alt={$ALT},
epsilon={$EPSILON}, eoa = {$EOA})
    ;

This code looks for a comment after each alternative and prints it out. eg
for input:

grammar T;
r : b // comm 1\n ;
r1 : b // comm 2\n ;

It prints out:

// comm 1

// comm 2

But now I want to put these comments back in the outputgrammar. I assume
this would imply adding some template code to insert the comment. But how
exactly is this done?

   - Do  I  need to  add  something  to the template rules
   alternative_elements(alt={$ALT}, element={$e}, eoa = {$EOA}) and
   alternative_epsilon(alt={$ALT}, epsilon={$EPSILON}, eoa = {$EOA})?
   - If so will this work if the code is put in an @after action?
   - Doing it this way implies that I have to change a lot of template
   rules, which is not very handy.
   - To recover all the comments this way, I have to put code in almost
   every rule in ANTLRv3Tree.g to look for the tokens near the rule and check
   if it is a comment. This kind of undoes the advantage of the hidden channel
   that you do not have to include comments in the tree in order to keep the
   parser simple. In fact now we would be moving all the complexity towards the
   walker instead of the parser.
   - Of course I could be looking at it the wrong way and I should recover
   the comments a different way. If so can someone tell me how?

Jens

2008/7/26 Jim Idle <jimi at temporal-wave.com>

> Each node of the tree has start and stop tokens. If you getRange or get tui
> can retrieve any token, on channel or not so use this at strategic points in
> your tree grammar to get at comments.
>
> Jim
>
> Sent via BlackBerry from T-Mobile
>
> ------------------------------
> *From*: "Jens Boeykens" <jens.boeykens at gmail.com>
> *Date*: Fri, 25 Jul 2008 12:07:56 +0200
> *To*: <antlr-interest at antlr.org>
> *Subject*: Re: [antlr-interest] Preserving Comments
> I want to do a similar thing. I'm using slightly adapted versions of
> ANTLRv3.g <http://www.antlr.org/grammar/ANTLR/ANTLRv3.g> and ANTLRv3Tree.g<http://www.antlr.org/grammar/ANTLR/ANTLRv3Tree.g>to regenerate ANTLRv3 grammars. I have extended the walker (ANTLRv3Tree)
> with template rewrite rules to regenerate the original antlr grammar, parsed
> with antlrv3.g. It works great, except for the comments. These are placed in
> a hidden channel and are not seen by the walker and thus not given to a
> template rewrite rule. I realize it is not appropriate to place the comments
> in the tree, because comments can be put everywhere and this would make the
> parser to complex. But how exactly do I tell my walker to look for tokens in
> the hidden channel or a self defined channel. Can someone give an example,
> because I really don't know where to begin or what syntax to use?
>
> "you can search within the start and stop tokens for the AST rule and find
> anything on channel 2, doing with it as you will."
>
> How and where exactly do I need to do this? In ANTLRv3Tree.g itself and if
> so with what syntax? Or is it done somewhere else in java code? I thought an
> AST rule didn't have a stop token, only start?
>
> An example what I'm trying to do:
> parsing of a grammar:             * r: INTEGER ; //comments*
> ANTLRv3.g makes a tree:       * (RULE r (BLOCK (ALT INTEGER EOA) EOB) EOR)
> *
> Now from this tree I reconstruct the grammar but I get      *  r: INTEGER
> ;     *thus the comments are gone.
> When I walk this tree in ANTLRv3Tree.g the rule "rule" is used. Should I
> add something to "rule" in ANTLRv3Tree.g?
>
> Sorry if this is a basic question, but an example would make things much
> clearer.
>
> Jens
>
> 2008/7/14 Jim Idle <jimi at temporal-wave.com>:
>
>>  On Mon, 2008-07-14 at 12:43 +0530, nilesh.kapile at tcs.com wrote:
>>
>>
>> Hello,
>>
>> I need to preserve comments and want to collect it in some data structure.
>>  How can we do that in ANTLR?
>>
>> Currently, I've following rule for comments:
>>
>> LINE_COMMENT
>>     :  '//' ~('\n'|'\r')* '\r'? '\n'   {$channel=HIDDEN;}
>>     ;
>>
>> The easiest way os to use your own channel, say 2, which means the parser
>> will ignore them but they are preserved in the input stream (actually they
>> are when HIDDEN but that really means 'anything you want to hide' and you
>> specifically want to inspect comments. Then, when you walk your tree
>> (assuming you are using a tree but that is best), at any point where the
>> comments are required, you can search within the start and stop tokens for
>> the AST rule and find anything on channel 2, doing with it as you will. You
>> can also do this from the parser of course.
>>
>> The other option is to pass the COMMENT token through as a normal token,
>> and create parser rules to assemble them at various points. The problem here
>> comes when the COMMENT can be anywhere, such as in the middle of
>> expressions, so the parser ends up having the COMMENT token everywhere and
>> complicates your grammar enormously.
>>
>> Jim
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080728/e80f09fd/attachment.html 


More information about the antlr-interest mailing list