[antlr-interest] Documenting grammars

Mon Mar 23 08:48:18 PDT 2009

Jim Idle wrote:
> Sam Barnett-Cormack wrote:
>> Hi all,
>>
>> So, we use doc-comments (/** */) in our grammars. However, as far as I
>> can tell, there's no way to auto-process these and generate nice docs.
>> Does anyone know of one?
>>
>> *If* the answer is no, I'm interested in using some of my spare time to
>> create one. I've already looked into adapting the GPL source for javadoc
>> and the standard doclet. 
> It would probably be easier/better to take the v3 grammar and write a 
> front end to doxygen,

Well, from a developing-to-scratch-an-itch point of view, I'm less
bothered about doxygen because I don't use it ;) and I also have no idea
how one writes a language reader for doxygen. It may, however, actually
be easier than for javadoc - doxygen is already language-agnostic, and
probably won't mind using a new set of terminology (grammar and rule
rather than class and method, for instance).

> or optionally spit out a pseudo class that javadoc 
> can use.

If one wanted to produce something that didn't call grammars "classes"
and rules "methods", you need to hack javadoc more than that, to
introduce the new documentable-elements. Javadoc as it stands is
hard-wired to use certain concepts - package, class, method, field,
annotation, and so on.

> Perhaps we should really be making the DOC_COMMENT tokens go 
> through to the generated target. We would have to pass the token through 
> a function for the code generator target though so it could adapt it to 
> the target language. While /** xxx */ will work for most, it probably 
> does not work for all and I am not convinced that it is that useful in 
> the target language anyway (though I can see how class documentation is, 
> but that is already covered if you use separate lexer and parser 
> grammars which is an easy thing to do).

For me, it makes sense to document the grammar separately - a
description of the token, say, isn't really appropriate as the method
documentation for the mTOKEN methods in the lexer.

I'm talking about distinct documentation of the grammar, which could
(and probably should) even be target-language independent. Would
probably want to add a concept of "package" not present in ANTLR in
order to group grammars together - so the lexer, parser, treefilter and
treeparser for one language are all documented in a connected way
separate from other languages, rather than either having all grammars
separate or all bundled together.

Of course, it might be more appropriate, given language recognition as a
different paradigm from OOP, to build a tool completely from scratch,
but better to try to see if there's anything to adapt or tweak first.

Sam