[antlr-interest] Names of generated files and classes

Mon Aug 6 08:50:27 PDT 2007

On Aug 6, 2007, at 5:22 PM, Johannes Luber wrote:

> Kay Roepke wrote:
>>
>> There cannot be any spaces in the grammar name anyway, so escaping  
>> is a
>> non-issue.
>> Probably every filesystem in significant use can work with ANTLR  
>> grammar
>> names as is, no escaping
>> necessary.
>
> In the case that directories are output with spaces they will be
> escaped. IIRC, Unix can use both backslashes and colons in filenames.
> The usage of the proprietary format doesn't protect you from such  
> cases
> and even forces to do the unquoting by yourself. Not to mention  
> that the
> output format in XML can at least add information without breaking  
> every
> application.

Ok. So you mean using the -fo or -o options might introduce  
characters that need to be escaped, I missed that.
Unfortunately I think there's a bug in there as I have not been able  
to actually specify a -fo directory containing
a space character (even with the proper escaping). That's most likely  
a shortcoming of ANTLR's option parsing
code.

>> Not every tool wants to parse XML. In most cases you specifically  
>> do not
>> want to parse XML (and handrolling an XML lexer/parser is  
>> significantly
>> more work than writing one for the current output!)
>
> It doesn't have necessarily to be XML, but nonetheless the current
> output lacks two columns with the semantic information. At least the
> output needs to be changed to:
>
> TParser.java	: Source	: T.g		: Grammar
> T.tokens	: Tokens	: T.g		: Grammar
> T__.g		: Grammar	: T.g		: Grammar
> U.g		: Grammar	: T.tokens	: Tokens
> U.java		: Source	: U.g		: Grammar
> U.tokens	: Tokens	: U.g		: Grammar

Why do we need that information? It seems a superflous to me. ANTLR will
always use .g as the suffix for intermediate grammars (the T__.g file  
is scheduled for removal, BTW, i.e.
we should remove it after generating the lexer). Also the suffix of  
the tokens file is not likely to
ever change. Everything else is a source file. The tool that invokes  
ANTLR can easily know these things.

>> A tool that uses the depend information of ANTLR will be tied to  
>> ANTLR
>> anyway, so I don't think we have an issue of hypothetically changing
>> suffixes. Build tools would need to be updated then in any case.
>> Plus, no tool should need to know what the .tokens file is good for.
>
> That is wrong. First of all, tools may want to treat token files
> differently from source files. I can imagine a grammar explorer which
> shows the file dependencies as a tree and wants to color tokens,  
> grammar
> and source files differently. With my annotated format there is no
> guessing, to which kind each entry belongs to.

That application would be geared towards ANTLR and I would expect it  
to know quite a bit about ANTLR.
For example the name the tokens file will have, as well as the  
internal layout of that file.
Even with XML (or any other annotated format) you still have to  
encode knowledge in the calling program.
Given that the file suffixes are so unlikely to change, I don't think  
we need such a verbose format, IMHO.

> Secondly, I've written a tool, which reads token files, so I don't  
> have
> to add the rule names all by myself. The rules are annotated how the
> appropriate rule is to be translated into RELAX NG. If I couldn't read
> token files in, I'd have certainly lots of work.

Well, you can. It's pretty easy to do so. It's the grammar name with  
a 'tokens' suffix.

> Best regards,
> Johannes Luber

best,

-k
-- 
Kay Röpke
http://classdump.org/