[antlr-interest] How do I find */
Vidar Håkestad
vidar at hawkis.com
Tue Sep 6 12:43:26 PDT 2005
Hello, Interest
I'm trying to split Java documentary comments and Java code into two separate
lexers (as hinted in the 'ANTLR Specification: Token Streams' section).
What I want to accomplish is to make the sub-parser/lexer (for the JavaDoc) to
just hand me the full block as is. No parsing is needed, I just want the
documentary comment with stars and all (except maybe without the entry and
exit tokens (i.e. /** and */)).
In my JavaSrcLexer, I have the following start of the ML_COMMENT:
ML_COMMENT
: "/*" ~'*'
etc
to avoid any problems with ambiguity. Then the actual documentary comment
starts with:
JAVADOC_OPEN
: "/**" {selector.push("srclexer");}
;
and in the javadoc parser rule in JavaSrcParser:
javadoc
: JAVADOC_OPEN
{
// Create a (sub) parser to handle the javadoc comment
//
JavaDocParser jdocparser = new JavaDocParser(getInputState());
jdocparser.content();
}
;
I have created a separate lexer for the actual documentary comment;
JavaDocLexer, where I have the following end rule for a documentary block:
JAVADOC_CLOSE
: "*/" {selector.pop();} // Pops the stream back to JavaSrcLexer/Parser
;
This scheme seems to work as far as invoking the JavaDocParser's content rule,
but when I try to keep the internals of the original ML_COMMENT rule (from
the original Java Lexer rules), which looks like this:
DOC_CONTENT
:
( /* '\r' '\n' can be matched in one alternative or by matching
'\r' in one iteration and '\n' in another. I am trying to
handle any flavor of newline that comes in, but the language
that allows both "\r\n" and "\r" and "\n" to all be valid
newline is ambiguous. Consequently, the resulting grammar
must be ambiguous. I'm shutting this warning off.
*/
options {
generateAmbigWarnings=false;
}
:
{ LA(2)!='/' }? '*'
| '\r' '\n' {newline();}
| '\r' {newline();}
| '\n' {newline();}
| ~('*'|'\n'|'\r')
)*
;
called from the production (in JavaDocParser)
content
: javaDoc:DOC_CONTENT
{
System.err.println("Content of block is "+javaDoc.getText());
}
;
I get errors as soon as any character is found. I suspect that the content
rule for the internals for a comment should be quite different. The question
is how it should be? To me it should accept any character sequence until the
*/ pattern is found (and the consume that as well before letting the main
parser/lexer regain control).
So how do I find the whole content of the documentary comment, i.e. stop
parsing in the JavaDocParser/Lexer as soon as I see the "*/" token?
Any ideas to get me started?
Regards
Vidar
More information about the antlr-interest
mailing list