[antlr-interest] Names of generated files and classes

Mon Aug 6 08:22:25 PDT 2007

Kay Roepke wrote:
> 
> On Aug 6, 2007, at 3:18 PM, Johannes Luber wrote:
> 
>> And no escaping for spaces is necessary. (Are '\' also escaped?) I know
>> that Ter doesn't like XML that much, but as it is meant for programs
>> anyway, I think it is the best and simplest way.
> 
> There cannot be any spaces in the grammar name anyway, so escaping is a
> non-issue.
> Probably every filesystem in significant use can work with ANTLR grammar
> names as is, no escaping
> necessary.

In the case that directories are output with spaces they will be
escaped. IIRC, Unix can use both backslashes and colons in filenames.
The usage of the proprietary format doesn't protect you from such cases
and even forces to do the unquoting by yourself. Not to mention that the
output format in XML can at least add information without breaking every
application.

> Not every tool wants to parse XML. In most cases you specifically do not
> want to parse XML (and handrolling an XML lexer/parser is significantly
> more work than writing one for the current output!)

It doesn't have necessarily to be XML, but nonetheless the current
output lacks two columns with the semantic information. At least the
output needs to be changed to:

TParser.java	: Source	: T.g		: Grammar
T.tokens	: Tokens	: T.g		: Grammar
T__.g		: Grammar	: T.g		: Grammar
U.g		: Grammar	: T.tokens	: Tokens
U.java		: Source	: U.g		: Grammar
U.tokens	: Tokens	: U.g		: Grammar

> A tool that uses the depend information of ANTLR will be tied to ANTLR
> anyway, so I don't think we have an issue of hypothetically changing
> suffixes. Build tools would need to be updated then in any case.
> Plus, no tool should need to know what the .tokens file is good for.

That is wrong. First of all, tools may want to treat token files
differently from source files. I can imagine a grammar explorer which
shows the file dependencies as a tree and wants to color tokens, grammar
and source files differently. With my annotated format there is no
guessing, to which kind each entry belongs to.

Secondly, I've written a tool, which reads token files, so I don't have
to add the rule names all by myself. The rules are annotated how the
appropriate rule is to be translated into RELAX NG. If I couldn't read
token files in, I'd have certainly lots of work.

Best regards,
Johannes Luber