[antlr-interest] How do I find */
Vidar Håkestad
vidar at hawkis.com
Tue Sep 6 14:13:06 PDT 2005
Found documentation on this in 'Lexical analyzis with ANTLR', chapter
'Nongreedy Lexer Subrules', so I tried the following in my JavaDocLexer:
DOC_CONTENT
: // Note: Nothing here, as the '/**' gets eaten by the outer lexer.
(options {generateAmbigWarnings=false;}
: { LA(2)!='/' }? '*' // as long as */ is not found?
| '\r' '\n' {newline();}
| '\r' {newline();} // Retain lineshifts
| '\n' {newline();}
| . // dot == any character
)*
"*/"
{
selector.pop();
}
;
Then trying to parse /** */, would give me
line 13:5: expecting DOC_CONTENT, found '*/'
Which is not expected (by me) at all.
This is the only lexer rule at the moment. And the lexer contains only one
rule calling this lexer rule.
I also tried the greedy = false; within the options field which didn't work
either.
Vidar
On Tuesday 06 September 2005 21:43, Vidar Håkestad wrote:
> Hello, Interest
> I'm trying to split Java documentary comments and Java code into two
> separate lexers (as hinted in the 'ANTLR Specification: Token Streams'
> section).
>
> What I want to accomplish is to make the sub-parser/lexer (for the JavaDoc)
> to just hand me the full block as is. No parsing is needed, I just want the
> documentary comment with stars and all (except maybe without the entry and
> exit tokens (i.e. /** and */)).
>
> In my JavaSrcLexer, I have the following start of the ML_COMMENT:
> ML_COMMENT
>
> : "/*" ~'*'
>
> etc
> to avoid any problems with ambiguity. Then the actual documentary comment
> starts with:
> JAVADOC_OPEN
>
> : "/**" {selector.push("srclexer");}
>
> ;
>
> and in the javadoc parser rule in JavaSrcParser:
> javadoc
>
> : JAVADOC_OPEN
>
> {
> // Create a (sub) parser to handle the javadoc comment
> //
> JavaDocParser jdocparser = new JavaDocParser(getInputState());
> jdocparser.content();
> }
> ;
>
> I have created a separate lexer for the actual documentary comment;
> JavaDocLexer, where I have the following end rule for a documentary block:
>
> JAVADOC_CLOSE
>
> : "*/" {selector.pop();} // Pops the stream back to JavaSrcLexer/Parser
>
> ;
>
> This scheme seems to work as far as invoking the JavaDocParser's content
> rule, but when I try to keep the internals of the original ML_COMMENT rule
> (from the original Java Lexer rules), which looks like this:
>
> DOC_CONTENT
>
> ( /* '\r' '\n' can be matched in one alternative or by matching
> '\r' in one iteration and '\n' in another. I am trying to
> handle any flavor of newline that comes in, but the language
> that allows both "\r\n" and "\r" and "\n" to all be valid
> newline is ambiguous. Consequently, the resulting grammar
> must be ambiguous. I'm shutting this warning off.
> */
> options {
> generateAmbigWarnings=false;
> }
>
> { LA(2)!='/' }? '*'
>
> | '\r' '\n' {newline();}
> | '\r' {newline();}
> | '\n' {newline();}
> | ~('*'|'\n'|'\r')
>
> )*
> ;
>
> called from the production (in JavaDocParser)
> content
>
> : javaDoc:DOC_CONTENT
>
> {
> System.err.println("Content of block is "+javaDoc.getText());
> }
> ;
>
> I get errors as soon as any character is found. I suspect that the content
> rule for the internals for a comment should be quite different. The
> question is how it should be? To me it should accept any character sequence
> until the */ pattern is found (and the consume that as well before letting
> the main parser/lexer regain control).
>
> So how do I find the whole content of the documentary comment, i.e. stop
> parsing in the JavaDocParser/Lexer as soon as I see the "*/" token?
>
> Any ideas to get me started?
>
> Regards
> Vidar
More information about the antlr-interest
mailing list