[antlr-interest] Language Independence
Terence Parr
parrt at jguru.com
Thu Feb 27 14:26:30 PST 2003
On Thursday, February 27, 2003, at 01:54 PM, Tiller, Michael (M.M.)
wrote:
> Disclaimer: This message includes some issues I'm having which I have
> been unable to find a solution for. Hopefully, all these issues can
> be addressed without any change to ANTLR. If not, I'd be interested
> in discussing whether we might see some changes in ANTLR in the future
> to address these issues.
>
> With C#, Java and C++ current supported by ANTLR (and Python on the
> way?), it seems to me that ANTLR has the great advantage of being a
> language agnostic tool. I applaud this. However, it stops just short
> of being truly language independent which is a shame.
>
> To give you some background, I have developed a
> lexer+parser+treewalker. I'm quite pleased by the fact that the
> entire arrangement is *NEARLY* language independent. This is really a
> shame because it seems to me that it could be made completely language
> neutral. To me, there are only three real issues. The first one may
> have a solution, the second one seems like it could be addressed
> easily but the third one (quite deliberately if I'm not mistaken)
> doesn't have a current workaround:
>
> 1) My first problem is that in the language I'm interested in
> (Modelica), has some rules that are somewhat complicated. Suffice it
> to say that I have several optional qualifiers that appear at the
> start of the rule and I want them as the last children of that AST.
> So, I use the "!" to suppress automatically including them and then I
> add a statement like "#cd1->addChild(#f)" or "#cd1.addChild(#f)"
> depending on whether I am using C++ or Java, respectively.
> Admittedly, I might be able to avoid this particular manipulation of
> the tree but it would be nice (and perhaps I'm just not aware) if a
> language-neutral way existed for this.
>
> 2) In my lexer, I have a similar problem. I need to process comments
> and dump whitespace. If I understand this correctly, this is
> typically done like this:
>
> WS
> : (' '
> | '\t'
> | '\n' { newline(); }
> | '\r')
> {
> // _ttype = antlr::Token::SKIP;
> _ttype = Token.SKIP;
> }
Actually $setType(Token.SKIP) is the "right" way ;)
> ;
>
> Once again, a slightly different syntax is required depending on
> whether this is C++ or Java (or C# or Python, etc).
>
> 3) This is the more significant (and in my opinion, downright silly)
> obstacle to achieving language independence. My grammar file starts
> with:
>
> options {
> language = "Cpp";
> }
>
> Why oh why am I prohibited from making this a command line option? I
> know this was discussed before, but I never understood the evil of
> command-line options. Perhaps there is a reason why associated the
> language with the grammar would be useful (when the grammar/treewalker
> includes actions for example). But mine are (or could be made)
> essentially language neutral except for this one line!?!?!
If there are no actions then no problem: cmd-line would work. Rarely
do you have no actions, however, and the {...} must be handled in a
language sensitive way :)
> How about a compromise. Keep the "language = ..." option, but allow
> the *default language* to be controlled from the command line. Then
> you do not lose or exchange this functionality, you merely augment
> it?!? Can we agree to that?
What do you do about actions?
> If these issues are resolved, I will be at peace with my ".g" files
> because I won't feel like I have needlessly over-constrained their > use.
Perhaps just building trees is the answer...that is language
independent :)
Ter
--
Co-founder, http://www.jguru.com
Creator, ANTLR Parser Generator: http://www.antlr.org
Lecturer in Comp. Sci., University of San Francisco
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list