[antlr-interest] How do I find */

Vidar Håkestad vidar at hawkis.com
Tue Sep 6 12:43:26 PDT 2005


Hello, Interest
I'm trying to split Java documentary comments and Java code into two separate 
lexers (as hinted in the 'ANTLR Specification: Token Streams' section).

What I want to accomplish is to make the sub-parser/lexer (for the JavaDoc) to 
just hand me the full block as is. No parsing is needed, I just want the 
documentary comment with stars and all (except maybe without the entry and 
exit tokens (i.e. /** and */)).

In my JavaSrcLexer, I have the following start of the ML_COMMENT:
ML_COMMENT
	: "/*" ~'*'
             etc
to avoid any problems with ambiguity. Then the actual documentary comment 
starts with:
JAVADOC_OPEN
	: "/**" {selector.push("srclexer");}
	;

and in the javadoc parser rule in JavaSrcParser:
javadoc
	:  JAVADOC_OPEN
		{
			// Create a (sub) parser to handle the javadoc comment
			//
			JavaDocParser jdocparser = new JavaDocParser(getInputState());
			jdocparser.content();
		}
	;

I have created a separate lexer for the actual documentary comment;
JavaDocLexer, where I have the following end rule for a documentary block:

JAVADOC_CLOSE
	: "*/" {selector.pop();} // Pops the stream back to JavaSrcLexer/Parser
	;

This scheme seems to work as far as invoking the JavaDocParser's content rule, 
but when I try to keep the internals of the original ML_COMMENT rule (from 
the original Java Lexer rules), which looks like this:

DOC_CONTENT
	:
		(	/*	'\r' '\n' can be matched in one alternative or by matching
				'\r' in one iteration and '\n' in another. I am trying to
				handle any flavor of newline that comes in, but the language
				that allows both "\r\n" and "\r" and "\n" to all be valid
				newline is ambiguous. Consequently, the resulting grammar
				must be ambiguous. I'm shutting this warning off.
			 */
			options {
				generateAmbigWarnings=false;
			}
		:
			{ LA(2)!='/' }? '*'
		|	'\r' '\n'		{newline();}
		|	'\r'			{newline();}
		|	'\n'			{newline();}
		|	~('*'|'\n'|'\r')
		)*
	;

called from the production (in JavaDocParser)
content
	: javaDoc:DOC_CONTENT
		{
			System.err.println("Content of block is "+javaDoc.getText());
		}
	;

I get errors as soon as any character is found. I suspect that the content 
rule for the internals for a comment should be quite different. The question 
is how it should be? To me it should accept any character sequence until the 
*/ pattern is found (and the consume that as well before letting the main 
parser/lexer regain control).

So how do I find the whole content of the documentary comment, i.e. stop 
parsing in the JavaDocParser/Lexer as soon as I see the "*/" token?

Any ideas to get me started?

Regards
Vidar


More information about the antlr-interest mailing list