[antlr-interest] Grammar Critique: Preserving Certain Comments

Wed Apr 29 01:35:21 PDT 2009

At 17:52 29/04/2009, Michael Coupland wrote:
 >An example is probably clearest:
 >
 >	struct foo // ignored comment 'foo'
 >	{
 >		int a; // comment on 'a' that I want in the AST
 >		// ignored comment
 >		int b;
 >		// another ignored comment
 >	};
[...]
 >I'm particluarly concerned that it doesn't intuitively extend to
 >parsing the 'foo' comment above, since my grammar relies on
 >MEMBER_COMMENT being prefixed with a semicolon so the lexer 
won't
 >throw it out as a CPP_COMMENT. Does anyone have any better ways 
to
 >solve this problem? Suggestions on how to match the 'foo' 
comment?

The way I'd try to do it is to set a flag on newline, leave the 
flag alone on whitespace, and clear the flag on 
non-whitespace.  If the flag is still set when the '//' is 
encountered, then it's a comment-following-something-else and you 
produce your alternate token.  (This might catch more cases than 
you intend, however.)

It's probably easier to do this by overriding nextToken rather 
than by trying to implement it in the lexer rules (because of the 
many places you'd need to clear the flag).

Yet another alternative is to put all the comments into another 
channel (not the hidden channel), and then check for tokens in 
that channel near the constructs in the main channel that you're 
interested in.