[antlr-interest] Parsing documentation comments (with nesting!) (v3)
Rick Mann
rmann at latencyzero.com
Wed Feb 21 19:36:59 PST 2007
Antlr v3b6.
I've been working an a tool to create a symbol database for the D
programming language. This means that I don't need a complete parser,
just enough of one to identify a few "global" symbol definitions. I'm
doing okay with some language basics, but I'm running in to trouble
parsing comments. I have a couple of big questions.
If you're unfamiliar, D is a programming language that looks a lot
like C++ and Java. In particular, it has multiline comments delimited
by '/*' and '*/'. It has "to-EOL" comments that start with '//' and
go to the end of the line.
It also has nesting multiline comments. You can delimit a comment
with '/+' and '+/', and nest these arbitrarily deeply.
A variant of each of these three denotes a Documentation Comment. If
a comment starts with '/**', '/++' or '///', it is considered
documentation, and applies to the symbols defined "nearby" (the
specific rules are not important). The comment itself has a structure
that would be nice to include in the overall grammar.
At the most basic level, I'd like to be able to get at the content of
a regular multiline comment. The beta book shows an example like this:
COMMENT
: '/*' ( options {greedy=false;} : . )* '*/'
;
I've tried this, and it works fine, but I can't get at the text of
the comment. I tried labeling the subrule, but it didn't like that.
So I tried this:
COMMENT
: '/*'! COMMENTTEXT '*/'! { System.out.println("Found a
comment [" + $COMMENTTEXT.text + "]"); }
;
fragment
COMMENTTEXT
options
{
greedy = false;
}
: .*
;
But I get "The following alternatives are unreachable: 1".
(Keep in mind, my grammar will eventually generate an AST, but right
now has code to help me debug and learn).
I'd like to parse the structure of the Doc Comments, which is
somewhat line-oriented, so getting each line in turn would be helpful.
Question 1: How would I write a grammar to accommodate this need?
-------------
Question 2: How can I write grammar to essentially skip a function
body? In D you can both declare and define functions, just like in C:
int foo(char x, int, long y);
or
int bar(char x, int, long y)
{
}
For my purposes, I don't care what happens inside the {}, but since
braces can nest arbitrarily deeply, I need to parse through it
properly. I'm having trouble understanding how to avoid the left
recursion that makes ANTLR choke. In any case, I suspect this grammar
will look just like the grammar for the nesting comments above,
except that I can throw out anything inside the body.
I'd really appreciate any help anyone can give. Thank you!
--
Rick
More information about the antlr-interest
mailing list