[antlr-interest] "Comments" token from source to the target language

Mon Nov 12 19:56:05 PST 2007

Mateus Baur da Silva wrote:
> As I mentioned in some my other email, I doing a translator from a 
> Pascal subset to java. Currently, I'm ignoring the "comments" by using 
> skip() on the lexer rule that defines the "comments".
> 
> However, I would like to translate the comments from Pascal to Java code 
> as well. I was wondering if I could do that by using the HIDDEN_CHANNEL 
> or some other feature to properly translate the comments. Does someone 
> have any clue on how to do that?

Another way to look at this is to consider input vs. output. In a 
program language parser, you parse the input source text into 
implementable units. In this context, comments have no meaning and are 
skipped or shuttled to the HIDDEN token stream channel.

However, in your situation, you are translating one source language into 
another. In this context, comments not only have meaning, they are part 
of the output. As such, they should be handled by the parser as part of 
the source language and not punted by the lexer.

The components of Pascal comments become valid tokens, the different 
Pascal comment syntaxes are parsed matching those tokens, and you use 
the tokens for the comment text to emit the Java style comments. This 
allows you to distinguish between single-line and multi-line comments, 
and even to prepend " * " to the interior lines of multi-line comments.

In this manner, the comment tokens are a valid part of the main token 
stream and there is no need to use any special code to read alternative 
token stream channels.

I hope that helps.
-- Curtis