[antlr-interest] Language Independence

Terence Parr parrt at jguru.com
Thu Feb 27 14:26:30 PST 2003


On Thursday, February 27, 2003, at 01:54 PM, Tiller, Michael (M.M.) 
wrote:

> Disclaimer: This message includes some issues I'm having which I have 
> been unable to find a solution for.  Hopefully, all these issues can 
> be addressed without any change to ANTLR.  If not, I'd be interested 
> in discussing whether we might see some changes in ANTLR in the future 
> to address these issues.
>
> With C#, Java and C++ current supported by ANTLR (and Python on the 
> way?), it seems to me that ANTLR has the great advantage of being a 
> language agnostic tool.  I applaud this.  However, it stops just short 
> of being truly language independent which is a shame.
>
> To give you some background, I have developed a 
> lexer+parser+treewalker.  I'm quite pleased by the fact that the 
> entire arrangement is *NEARLY* language independent.  This is really a 
> shame because it seems to me that it could be made completely language 
> neutral.  To me, there are only three real issues.  The first one may 
> have a solution, the second one seems like it could be addressed 
> easily but the third one (quite deliberately if I'm not mistaken) 
> doesn't have a current workaround:
>
> 1) My first problem is that in the language I'm interested in 
> (Modelica), has some rules that are somewhat complicated.  Suffice it 
> to say that I have several optional qualifiers that appear at the 
> start of the rule and I want them as the last children of that AST.  
> So, I use the "!" to suppress automatically including them and then I 
> add a statement like "#cd1->addChild(#f)" or "#cd1.addChild(#f)" 
> depending on whether I am using C++ or Java, respectively.  
> Admittedly, I might be able to avoid this particular manipulation of 
> the tree but it would be nice (and perhaps I'm just not aware) if a 
> language-neutral way existed for this.
>
> 2) In my lexer, I have a similar problem.  I need to process comments 
> and dump whitespace.  If I understand this correctly, this is 
> typically done like this:
>
> WS
>     : (' '
>         | '\t'
>         | '\n' { newline(); }
>         | '\r')
>         {
> //            _ttype = antlr::Token::SKIP;
>             _ttype = Token.SKIP;
>         }

Actually $setType(Token.SKIP) is the "right" way ;)

>     ;
>
> Once again, a slightly different syntax is required depending on 
> whether this is C++ or Java (or C# or Python, etc).
>
> 3) This is the more significant (and in my opinion, downright silly) 
> obstacle to achieving language independence.  My grammar file starts 
> with:
>
> options {
>     language = "Cpp";
> }
>
> Why oh why am I prohibited from making this a command line option?  I 
> know this was discussed before, but I never understood the evil of 
> command-line options.  Perhaps there is a reason why associated the 
> language with the grammar would be useful (when the grammar/treewalker 
> includes actions for example).  But mine are (or could be made) 
> essentially language neutral except for this one line!?!?!

If there are no actions then no problem: cmd-line would work.  Rarely 
do you have no actions, however, and the {...} must be handled in a 
language sensitive way :)

> How about a compromise.  Keep the "language = ..." option, but allow 
> the *default language* to be controlled from the command line.  Then 
> you do not lose or exchange this functionality, you merely augment 
> it?!?  Can we agree to that?

What do you do about actions?

> If these issues are resolved, I will be at peace with my ".g" files 
> because I won't feel like I have needlessly over-constrained their > use.

Perhaps just building trees is the answer...that is language 
independent :)

Ter
--
Co-founder, http://www.jguru.com
Creator, ANTLR Parser Generator: http://www.antlr.org
Lecturer in Comp. Sci., University of San Francisco


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list