[antlr-interest] How do I find */

Vidar Håkestad vidar at hawkis.com
Tue Sep 6 14:13:06 PDT 2005


Found documentation on this in 'Lexical analyzis with ANTLR', chapter 
'Nongreedy Lexer Subrules', so I tried the following in my JavaDocLexer:
DOC_CONTENT
    :  // Note: Nothing here, as the '/**' gets eaten by the outer lexer.
    (options {generateAmbigWarnings=false;}
        :   { LA(2)!='/' }? '*'    // as long as */ is not found?
        |   '\r' '\n'    {newline();}
        |   '\r'         {newline();}  // Retain lineshifts
        |   '\n'         {newline();}
        |	.  // dot == any character
    )*
    "*/"
    {
       	selector.pop();
    }
    ;

Then trying to parse /** */, would give me
line 13:5: expecting DOC_CONTENT, found '*/'

Which is not expected (by me) at all.
This is the only lexer rule at the moment. And the lexer contains only one 
rule calling this lexer rule.

I also tried the greedy = false; within the options field which didn't work 
either.

Vidar

On Tuesday 06 September 2005 21:43, Vidar Håkestad wrote:
> Hello, Interest
> I'm trying to split Java documentary comments and Java code into two
> separate lexers (as hinted in the 'ANTLR Specification: Token Streams'
> section).
>
> What I want to accomplish is to make the sub-parser/lexer (for the JavaDoc)
> to just hand me the full block as is. No parsing is needed, I just want the
> documentary comment with stars and all (except maybe without the entry and
> exit tokens (i.e. /** and */)).
>
> In my JavaSrcLexer, I have the following start of the ML_COMMENT:
> ML_COMMENT
>
> 	: "/*" ~'*'
>
>              etc
> to avoid any problems with ambiguity. Then the actual documentary comment
> starts with:
> JAVADOC_OPEN
>
> 	: "/**" {selector.push("srclexer");}
>
> 	;
>
> and in the javadoc parser rule in JavaSrcParser:
> javadoc
>
> 	:  JAVADOC_OPEN
>
> 		{
> 			// Create a (sub) parser to handle the javadoc comment
> 			//
> 			JavaDocParser jdocparser = new JavaDocParser(getInputState());
> 			jdocparser.content();
> 		}
> 	;
>
> I have created a separate lexer for the actual documentary comment;
> JavaDocLexer, where I have the following end rule for a documentary block:
>
> JAVADOC_CLOSE
>
> 	: "*/" {selector.pop();} // Pops the stream back to JavaSrcLexer/Parser
>
> 	;
>
> This scheme seems to work as far as invoking the JavaDocParser's content
> rule, but when I try to keep the internals of the original ML_COMMENT rule
> (from the original Java Lexer rules), which looks like this:
>
> DOC_CONTENT
>
> 		(	/*	'\r' '\n' can be matched in one alternative or by matching
> 				'\r' in one iteration and '\n' in another. I am trying to
> 				handle any flavor of newline that comes in, but the language
> 				that allows both "\r\n" and "\r" and "\n" to all be valid
> 				newline is ambiguous. Consequently, the resulting grammar
> 				must be ambiguous. I'm shutting this warning off.
> 			 */
> 			options {
> 				generateAmbigWarnings=false;
> 			}
>
> 			{ LA(2)!='/' }? '*'
>
> 		|	'\r' '\n'		{newline();}
> 		|	'\r'			{newline();}
> 		|	'\n'			{newline();}
> 		|	~('*'|'\n'|'\r')
>
> 		)*
> 	;
>
> called from the production (in JavaDocParser)
> content
>
> 	: javaDoc:DOC_CONTENT
>
> 		{
> 			System.err.println("Content of block is "+javaDoc.getText());
> 		}
> 	;
>
> I get errors as soon as any character is found. I suspect that the content
> rule for the internals for a comment should be quite different. The
> question is how it should be? To me it should accept any character sequence
> until the */ pattern is found (and the consume that as well before letting
> the main parser/lexer regain control).
>
> So how do I find the whole content of the documentary comment, i.e. stop
> parsing in the JavaDocParser/Lexer as soon as I see the "*/" token?
>
> Any ideas to get me started?
>
> Regards
> Vidar


More information about the antlr-interest mailing list