[antlr-interest] Preserving Comments

Jens Boeykens jens.boeykens at gmail.com
Mon Jul 28 02:44:08 PDT 2008


Found a solution to get the comments in the template, but I'm not sure if it
is the recommended way.

alternative
@after {
  CommonTree t = $alternative.start;
  if(t.getTokenStopIndex() >= 0){
      int stop = t.getTokenStopIndex();
      CommonTokenStream tokens = (CommonTokenStream) input.getTokenStream();
      while(stop + 1 < tokens.size() && tokens.get(stop + 1).getLine() ==
tokens.get(t.getTokenStopIndex()).getLine())
          stop++;
      List<Token> comments = tokens.getTokens(t.getTokenStopIndex(), stop,
ANTLRv3Parser.SL_COMMENT);
      if(comments != null){
          retval.st.setAttribute("comment", comments.get(0).getText());
      }
  }
}
    :   ^(ALT (e+=element)+ EOA) -> alternative_elements(alt={$ALT},
element={$e}, eoa = {$EOA})
    |   ^(ALT EPSILON EOA)  -> alternative_epsilon(alt={$ALT},
epsilon={$EPSILON}, eoa = {$EOA})
    ;

2008/7/28 Jens Boeykens <jens.boeykens at gmail.com>

> I managed to print out some of the comments with this changed code in
> ANTLRv3Tree.g:
>
> alternative
> @after {
>   CommonTree t = $alternative.start;
>   int stop = t.getTokenStopIndex();
>   CommonTokenStream tokens = (CommonTokenStream) input.getTokenStream();
>   while(stop + 1 < tokens.size() && tokens.get(stop + 1).getLine() ==
> tokens.get(t.getTokenStopIndex()).getLine())
>         stop++;
>   List<Token> comments = tokens.getTokens(t.getTokenStopIndex(), stop,
> ANTLRv3Parser.SL_COMMENT);
>   for(Token i : comments)
>      System.out.println(i.getText());
> }
>     :   ^(ALT (e+=element)+ EOA) -> alternative_elements(alt={$ALT},
> element={$e}, eoa = {$EOA})
>     |   ^(ALT EPSILON EOA)  -> alternative_epsilon(alt={$ALT},
> epsilon={$EPSILON}, eoa = {$EOA})
>     ;
>
> This code looks for a comment after each alternative and prints it out. eg
> for input:
>
> grammar T;
> r : b // comm 1\n ;
> r1 : b // comm 2\n ;
>
> It prints out:
>
> // comm 1
>
> // comm 2
>
> But now I want to put these comments back in the outputgrammar. I assume
> this would imply adding some template code to insert the comment. But how
> exactly is this done?
>
>    - Do  I  need to  add  something  to the template rules
>    alternative_elements(alt={$ALT}, element={$e}, eoa = {$EOA}) and
>    alternative_epsilon(alt={$ALT}, epsilon={$EPSILON}, eoa = {$EOA})?
>    - If so will this work if the code is put in an @after action?
>    - Doing it this way implies that I have to change a lot of template
>    rules, which is not very handy.
>    - To recover all the comments this way, I have to put code in almost
>    every rule in ANTLRv3Tree.g to look for the tokens near the rule and check
>    if it is a comment. This kind of undoes the advantage of the hidden channel
>    that you do not have to include comments in the tree in order to keep the
>    parser simple. In fact now we would be moving all the complexity towards the
>    walker instead of the parser.
>    - Of course I could be looking at it the wrong way and I should recover
>    the comments a different way. If so can someone tell me how?
>
> Jens
>
> 2008/7/26 Jim Idle <jimi at temporal-wave.com>
>
>  Each node of the tree has start and stop tokens. If you getRange or get
>> tui can retrieve any token, on channel or not so use this at strategic
>> points in your tree grammar to get at comments.
>>
>> Jim
>>
>> Sent via BlackBerry from T-Mobile
>>
>> ------------------------------
>> *From*: "Jens Boeykens" <jens.boeykens at gmail.com>
>> *Date*: Fri, 25 Jul 2008 12:07:56 +0200
>> *To*: <antlr-interest at antlr.org>
>> *Subject*: Re: [antlr-interest] Preserving Comments
>> I want to do a similar thing. I'm using slightly adapted versions of
>> ANTLRv3.g <http://www.antlr.org/grammar/ANTLR/ANTLRv3.g> and
>> ANTLRv3Tree.g <http://www.antlr.org/grammar/ANTLR/ANTLRv3Tree.g> to
>> regenerate ANTLRv3 grammars. I have extended the walker (ANTLRv3Tree) with
>> template rewrite rules to regenerate the original antlr grammar, parsed with
>> antlrv3.g. It works great, except for the comments. These are placed in a
>> hidden channel and are not seen by the walker and thus not given to a
>> template rewrite rule. I realize it is not appropriate to place the comments
>> in the tree, because comments can be put everywhere and this would make the
>> parser to complex. But how exactly do I tell my walker to look for tokens in
>> the hidden channel or a self defined channel. Can someone give an example,
>> because I really don't know where to begin or what syntax to use?
>>
>> "you can search within the start and stop tokens for the AST rule and find
>> anything on channel 2, doing with it as you will."
>>
>> How and where exactly do I need to do this? In ANTLRv3Tree.g itself and if
>> so with what syntax? Or is it done somewhere else in java code? I thought an
>> AST rule didn't have a stop token, only start?
>>
>> An example what I'm trying to do:
>> parsing of a grammar:             * r: INTEGER ; //comments*
>> ANTLRv3.g makes a tree:       * (RULE r (BLOCK (ALT INTEGER EOA) EOB)
>> EOR)*
>> Now from this tree I reconstruct the grammar but I get      *  r: INTEGER
>> ;     *thus the comments are gone.
>> When I walk this tree in ANTLRv3Tree.g the rule "rule" is used. Should I
>> add something to "rule" in ANTLRv3Tree.g?
>>
>> Sorry if this is a basic question, but an example would make things much
>> clearer.
>>
>> Jens
>>
>> 2008/7/14 Jim Idle <jimi at temporal-wave.com>:
>>
>>>  On Mon, 2008-07-14 at 12:43 +0530, nilesh.kapile at tcs.com wrote:
>>>
>>>
>>> Hello,
>>>
>>> I need to preserve comments and want to collect it in some data
>>> structure.  How can we do that in ANTLR?
>>>
>>> Currently, I've following rule for comments:
>>>
>>> LINE_COMMENT
>>>     :  '//' ~('\n'|'\r')* '\r'? '\n'   {$channel=HIDDEN;}
>>>     ;
>>>
>>>  The easiest way os to use your own channel, say 2, which means the
>>> parser will ignore them but they are preserved in the input stream (actually
>>> they are when HIDDEN but that really means 'anything you want to hide' and
>>> you specifically want to inspect comments. Then, when you walk your tree
>>> (assuming you are using a tree but that is best), at any point where the
>>> comments are required, you can search within the start and stop tokens for
>>> the AST rule and find anything on channel 2, doing with it as you will. You
>>> can also do this from the parser of course.
>>>
>>> The other option is to pass the COMMENT token through as a normal token,
>>> and create parser rules to assemble them at various points. The problem here
>>> comes when the COMMENT can be anywhere, such as in the middle of
>>> expressions, so the parser ends up having the COMMENT token everywhere and
>>> complicates your grammar enormously.
>>>
>>> Jim
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080728/f5022cb9/attachment.html 


More information about the antlr-interest mailing list