[antlr-interest] Preserving Comments

Jens Boeykens jens.boeykens at gmail.com
Fri Jul 25 03:07:56 PDT 2008


I want to do a similar thing. I'm using slightly adapted versions of
ANTLRv3.g <http://www.antlr.org/grammar/ANTLR/ANTLRv3.g> and
ANTLRv3Tree.g<http://www.antlr.org/grammar/ANTLR/ANTLRv3Tree.g>to
regenerate ANTLRv3 grammars. I have extended the walker (ANTLRv3Tree)
with template rewrite rules to regenerate the original antlr grammar, parsed
with antlrv3.g. It works great, except for the comments. These are placed in
a hidden channel and are not seen by the walker and thus not given to a
template rewrite rule. I realize it is not appropriate to place the comments
in the tree, because comments can be put everywhere and this would make the
parser to complex. But how exactly do I tell my walker to look for tokens in
the hidden channel or a self defined channel. Can someone give an example,
because I really don't know where to begin or what syntax to use?

"you can search within the start and stop tokens for the AST rule and find
anything on channel 2, doing with it as you will."

How and where exactly do I need to do this? In ANTLRv3Tree.g itself and if
so with what syntax? Or is it done somewhere else in java code? I thought an
AST rule didn't have a stop token, only start?

An example what I'm trying to do:
parsing of a grammar:             * r: INTEGER ; //comments*
ANTLRv3.g makes a tree:       * (RULE r (BLOCK (ALT INTEGER EOA) EOB) EOR)*
Now from this tree I reconstruct the grammar but I get      *  r: INTEGER
;     *thus the comments are gone.
When I walk this tree in ANTLRv3Tree.g the rule "rule" is used. Should I add
something to "rule" in ANTLRv3Tree.g?

Sorry if this is a basic question, but an example would make things much
clearer.

Jens

2008/7/14 Jim Idle <jimi at temporal-wave.com>:

>  On Mon, 2008-07-14 at 12:43 +0530, nilesh.kapile at tcs.com wrote:
>
>
> Hello,
>
> I need to preserve comments and want to collect it in some data structure.
>  How can we do that in ANTLR?
>
> Currently, I've following rule for comments:
>
> LINE_COMMENT
>     :  '//' ~('\n'|'\r')* '\r'? '\n'   {$channel=HIDDEN;}
>     ;
>
> The easiest way os to use your own channel, say 2, which means the parser
> will ignore them but they are preserved in the input stream (actually they
> are when HIDDEN but that really means 'anything you want to hide' and you
> specifically want to inspect comments. Then, when you walk your tree
> (assuming you are using a tree but that is best), at any point where the
> comments are required, you can search within the start and stop tokens for
> the AST rule and find anything on channel 2, doing with it as you will. You
> can also do this from the parser of course.
>
> The other option is to pass the COMMENT token through as a normal token,
> and create parser rules to assemble them at various points. The problem here
> comes when the COMMENT can be anywhere, such as in the middle of
> expressions, so the parser ends up having the COMMENT token everywhere and
> complicates your grammar enormously.
>
> Jim
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080725/09f296ad/attachment-0001.html 


More information about the antlr-interest mailing list