[antlr-interest] Having trouble with line numbers in ML_COMMENTS

Alex Shneyderman a.shneyderman at gmail.com
Thu Mar 22 01:41:17 PDT 2007


On 3/22/07, Gavin Lambert <antlr at mirality.co.nz> wrote:
> At 19:39 22/03/2007, Alex Shneyderman wrote:
>  >the problem I am seeing is taht the whenever my source has one
> of
>  >those the line numbering is one lees it should be. If I have two
>  >ML_COMMENTS the numbering is off by 2 and so on. I have a dirty
>  >fix for it like so:
> [...]
>  >but I can not understand why the original version is not working
>
>  >correctly.
>  >And of course my dirty fix will fail miserably when some one
> codes
>  >like so:
>  >
>  >1. /* ml comment on one line */ int i = 0;
>  >
>  >my int i will be on the second line.
>  >
>  >So, I wonder if anyone can explain it and suggest what to do?



> Are you sure all your other rules containing newline characters
> call newline() similarly?  In particular, how does your main
> newline/whitespace rule look?  Possibly you've forgotten a set of
> brackets or something so it's not doing what you think it's doing.

Well the rules are not mine :-) as I said the .g file is taken from
antlr site and I have not done much tinkering with it except for
inclusions of things into the AST that I need for my project that were
otherwise ommited with the ! notation.

The particular problem with ML_COMMENT I think I understand it now.
The problematic bit of the rule:

ML_COMMENT
	:	"/*" ~('*')
		(	options {
				generateAmbigWarnings=false;
			}
		:
			{ LA(2)!='/' }? '*'
		|	'\r' '\n'		{newline();}
		|	'\r'			{newline();}
		|	'\n'			{newline();}
		|	~('*'|'\n'|'\r')
		)*
		"*/"
		{$setType(Token.SKIP);}
	;


is this match on line two:
	:	"/*" ~('*')

so if one has a comment like this:
1. /*\n
2.  *\n
3.  */\n
4. int i = 0;

where \n is a new line. The line number of int i = 0; is 3. What
happens here is that when ~('*') is looked up and \n is found this
part of the rule matches but \n is swallowed without there being a
call to newline(); To test my theory I added an extra space like so:

1. /* \n
2.  *\n
3.  */\n
4. int i = 0;

note the extra space on the first line. And now the line number of int
i = 0; is 4. Because ~('*
) now matches the space and the subsequent part of the rule will match
'\n' and do the newline();

Anyway, I just took a look at the grammar that is published on the web site,
http://www.antlr.org/grammar/1090713067533/java15.g instead of the one
that comes with the src distribution, and it differs. In this
particular rule ~(*) is removed :-) and it works.


More information about the antlr-interest mailing list