[antlr-interest] Parsing line by line and multiline comments

Thu Apr 12 03:29:32 PDT 2012

Hi,

I am trying to parse some java source code and I have some issues 
because parsing
is done by creating a new lexer for each line that is transmitted by 
the IDE.
The problem is with multi-line comments because in the original grammar 
it tries
to match the closing */ token.
I have two strategies to resolve this problem

1) Parse the entire file at least once to indentify where are the 
multiline comments
    I will try this approach once I have resolved the problem 2) to 
compare performance.

2) Try to modify the grammar to not match the */ and maintain a 
variable where I store
    a flag to know if I am inside a block comment. So I have modified 
the java 1.6 grammar like this :

COMMENT
	:	'/*'
		{
			InBlockComment = true;
			$channel = Hidden;
		}
		(
			~('*')
			|	('*' ~'/') => '*'
		)*
		('*/' {InBlockComment = false;})?
	;

and in the code I have

public override IToken NextToken()
         {
             IToken next = base.NextToken();

             if ( next.Type != EOF && InBlockComment && next.Type != 
COMMENT )
             {
                 if ( next.Type == END_BLOCK_COMMENT )
                     InBlockComment = false;

                 next.Type = COMMENT;
                 next.Channel = Hidden;
             }

             return next;
         }

The problem I have is for instance with the following code :

/*
* ' I am inside a comment block and I am not a char literal
*/

because when I look at the NextToken values during each step I get :
/* => COMMENT (we set InBlockComment  to true - see above)
*  => STAR but inside NextToken we force it to be a COMMENT
EXCEPTION here because we end inside the CHARLITERAL and it tries to 
find the matching '

So my question is how can I "force" the lexer to be in another state ? 
In my case
once I have detected I am in a block comment I would like it parses the 
line starting in that state.

Thanks