[antlr-interest] Grammar Critique: Preserving Certain Comments

Michael Coupland mcoupland at gmail.com
Tue Apr 28 22:52:15 PDT 2009


I am writing a parser for what is effectively a C declaration syntax,
and want to add a somewhat unusual feature. Namely, I'd like to
support C/C++ single-line comments, but in certain cases I'd like to
retain comments that immediately follow a declaration.

An example is probably clearest:

	struct foo // ignored comment 'foo'
	{
		int a; // comment on 'a' that I want in the AST
		// ignored comment
		int b;
		// another ignored comment
	};


I've written a grammar (below) that seems to do what I want, but I
have a nagging feeling that it's not the best way to do what I want.
I'm particluarly concerned that it doesn't intuitively extend to
parsing the 'foo' comment above, since my grammar relies on
MEMBER_COMMENT being prefixed with a semicolon so the lexer won't
throw it out as a CPP_COMMENT. Does anyone have any better ways to
solve this problem? Suggestions on how to match the 'foo' comment? Any
feedback is appreciated!

Thanks!
	Michael

---------------------------------------------------------------------

	grammar KeepSomeComments;
	
	root 	:	 declaration* EOF
		;
	
	declaration
		:	'struct' IDENTIFIER '{' member_declaration* '}' ';'
		;
	
	member_declaration
		:	type IDENTIFIER (';' | MEMBER_COMMENT)
		;
	
	type	:	'int'
		;
	
	IDENTIFIER
	 	:	('a'..'z'|'A'..'Z'|'_')+ ;
	
	MEMBER_COMMENT
		:	';' (' '|'\t')* '//' (~('\n'|'\r'))* NEWLINE
		;
	
	CPP_COMMENT
		:	'//' (~('\n'|'\r'))* NEWLINE { $channel=HIDDEN; }
		;
	
	fragment NEWLINE
		:	('\n'|'\r')
		;
	
	WS	:	(' '|'\t'|'\n'|'\r')+ { $channel=HIDDEN; }
		;


More information about the antlr-interest mailing list