[antlr-interest] Language Independence

Thu Feb 27 13:54:50 PST 2003

Disclaimer: This message includes some issues I'm having which I have been unable to find a solution for.  Hopefully, all these issues can be addressed without any change to ANTLR.  If not, I'd be interested in discussing whether we might see some changes in ANTLR in the future to address these issues.

With C#, Java and C++ current supported by ANTLR (and Python on the way?), it seems to me that ANTLR has the great advantage of being a language agnostic tool.  I applaud this.  However, it stops just short of being truly language independent which is a shame.

To give you some background, I have developed a lexer+parser+treewalker.  I'm quite pleased by the fact that the entire arrangement is *NEARLY* language independent.  This is really a shame because it seems to me that it could be made completely language neutral.  To me, there are only three real issues.  The first one may have a solution, the second one seems like it could be addressed easily but the third one (quite deliberately if I'm not mistaken) doesn't have a current workaround:

1) My first problem is that in the language I'm interested in (Modelica), has some rules that are somewhat complicated.  Suffice it to say that I have several optional qualifiers that appear at the start of the rule and I want them as the last children of that AST.  So, I use the "!" to suppress automatically including them and then I add a statement like "#cd1->addChild(#f)" or "#cd1.addChild(#f)" depending on whether I am using C++ or Java, respectively.  Admittedly, I might be able to avoid this particular manipulation of the tree but it would be nice (and perhaps I'm just not aware) if a language-neutral way existed for this.

2) In my lexer, I have a similar problem.  I need to process comments and dump whitespace.  If I understand this correctly, this is typically done like this:

WS
    : (' '
        | '\t'
        | '\n' { newline(); }
        | '\r')
        {
//            _ttype = antlr::Token::SKIP;
            _ttype = Token.SKIP;
        }
    ;

Once again, a slightly different syntax is required depending on whether this is C++ or Java (or C# or Python, etc).

3) This is the more significant (and in my opinion, downright silly) obstacle to achieving language independence.  My grammar file starts with:

options {
    language = "Cpp";
}

Why oh why am I prohibited from making this a command line option?  I know this was discussed before, but I never understood the evil of command-line options.  Perhaps there is a reason why associated the language with the grammar would be useful (when the grammar/treewalker includes actions for example).  But mine are (or could be made) essentially language neutral except for this one line!?!?!

How about a compromise.  Keep the "language = ..." option, but allow the *default language* to be controlled from the command line.  Then you do not lose or exchange this functionality, you merely augment it?!?  Can we agree to that?

If these issues are resolved, I will be at peace with my ".g" files because I won't feel like I have needlessly over-constrained their use.

--
Mike

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/