[antlr-interest] Nested multi-line comments

Peter Boughton boughtonp at gmail.com
Sat Oct 24 09:40:54 PDT 2009


How do I support nested comments with ANTLR?

The standard example wont work with greedy/non-greedy, matching either
too much or not enough.

Here is a basic sample:
	<!---
		a comment
		<!--- nested comment --->
		still comment
	--->
	this is not commented
	<!--- more commenting --->

I want to tokenise comments, rather than ignore them, since the text
inside might be significant later on.

How the comments are tokenised doesn't matter - so I don't mind if
that first comment is split into 1/2/3 tokens.


However, Given that comments can be nested to an arbitrary level, I'm
having trouble working out how to define the rule without recursing
into itself.

(For practical purposes, I could probably set a maximum of maybe
ten-twenty levels deep, but I'd prefer not to create potential edge
cases like that.)


Here's my current attempt...

	COMMENT:
		SUB_CommentStart
		( SUB_NoComment | COMMENT )*
		SUB_CommentEnd
	;
	
	fragment SUB_CommentStart : '<!---' ;
	fragment SUB_CommentEnd   : '--->' ;
	fragment SUB_NoComment    : ~(SUB_CommentStart|SUB_CommentEnd)+;


Which of course fails due to including itself - so what's the best way
to fix that?


More information about the antlr-interest mailing list