[antlr-interest] Having trouble with line numbers in ML_COMMENTS

Alex Shneyderman a.shneyderman at gmail.com
Thu Mar 22 00:39:44 PDT 2007


I took a java15.g by these fellas:
 *              John Mitchell
 *		Terence Parr
 *		John Lilley		
 *		Scott Stanchfield	
 *		Markus Mohnen		
 *		Peter Williams		
 *		Allan Jacobs		
 *		Steve Messick	
 *		John Pybus		

from the antlr site. It has this rule in the lexer part:

// multiple-line comments
ML_COMMENT
	:	"/*" ~('*')
		(	/*	'\r' '\n' can be matched in one alternative or by matching
				'\r' in one iteration and '\n' in another. I am trying to
				handle any flavor of newline that comes in, but the language
				that allows both "\r\n" and "\r" and "\n" to all be valid
				newline is ambiguous. Consequently, the resulting grammar
				must be ambiguous. I'm shutting this warning off.
			 */
			options {
				generateAmbigWarnings=false;
			}
		:
			{ LA(2)!='/' }? '*'
		|	'\r' '\n'		{newline();}
		|	'\r'			{newline();}
		|	'\n'			{newline();}
		|	~('*'|'\n'|'\r')
		)*
		"*/"
		{$setType(Token.SKIP);}
	;

the problem I am seeing is taht the whenever my source has one of
those the line numbering is one lees it should be. If I have two
ML_COMMENTS the numbering is off by 2 and so on. I have a dirty fix
for it like so:

// multiple-line comments
ML_COMMENT
	:	"/*" ~('*')
		(	/*	'\r' '\n' can be matched in one alternative or by matching
				'\r' in one iteration and '\n' in another. I am trying to
				handle any flavor of newline that comes in, but the language
				that allows both "\r\n" and "\r" and "\n" to all be valid
				newline is ambiguous. Consequently, the resulting grammar
				must be ambiguous. I'm shutting this warning off.
			 */
			options {
				generateAmbigWarnings=false;
			}
		:
			{ LA(2)!='/' }? '*'
		|	'\r' '\n'		{newline();}
		|	'\r'			{newline();}
		|	'\n'			{newline();}
		|	~('*'|'\n'|'\r')
		)*
		"*/" {newline();} /* this is my dirty fox */
		{$setType(Token.SKIP);}
	;

but I can not understand why the original version is not working correctly.
And of course my dirty fix will fail miserably when some one codes like so:

1. /* ml comment on one line */ int i = 0;

my int i will be on the second line.

So, I wonder if anyone can explain it and suggest what to do?

-- 
Thanks,
Alex.


More information about the antlr-interest mailing list