[antlr-interest] Parsing line by line and multiline comments
forumer at smartmobili.com
forumer at smartmobili.com
Thu Apr 12 03:29:32 PDT 2012
Hi,
I am trying to parse some java source code and I have some issues
because parsing
is done by creating a new lexer for each line that is transmitted by
the IDE.
The problem is with multi-line comments because in the original grammar
it tries
to match the closing */ token.
I have two strategies to resolve this problem
1) Parse the entire file at least once to indentify where are the
multiline comments
I will try this approach once I have resolved the problem 2) to
compare performance.
2) Try to modify the grammar to not match the */ and maintain a
variable where I store
a flag to know if I am inside a block comment. So I have modified
the java 1.6 grammar like this :
COMMENT
: '/*'
{
InBlockComment = true;
$channel = Hidden;
}
(
~('*')
| ('*' ~'/') => '*'
)*
('*/' {InBlockComment = false;})?
;
and in the code I have
public override IToken NextToken()
{
IToken next = base.NextToken();
if ( next.Type != EOF && InBlockComment && next.Type !=
COMMENT )
{
if ( next.Type == END_BLOCK_COMMENT )
InBlockComment = false;
next.Type = COMMENT;
next.Channel = Hidden;
}
return next;
}
The problem I have is for instance with the following code :
/*
* ' I am inside a comment block and I am not a char literal
*/
because when I look at the NextToken values during each step I get :
/* => COMMENT (we set InBlockComment to true - see above)
* => STAR but inside NextToken we force it to be a COMMENT
EXCEPTION here because we end inside the CHARLITERAL and it tries to
find the matching '
So my question is how can I "force" the lexer to be in another state ?
In my case
once I have detected I am in a block comment I would like it parses the
line starting in that state.
Thanks
More information about the antlr-interest
mailing list