[antlr-interest] target language independent action code

Mon Jan 21 13:27:45 PST 2008

Hi Mark,

> > hi,
> >
> > I know this topic was discussed a couple of times here ...
> >
> > But as far as I know there is no solution available right now
> > (possibly apart from Loring Cramers yggdrasil).
> >
> > I think target language independent action code would be of great
> > help because:
> >
> > 1. ANTLR provides a steadily growing foundation of grammars for
> > various languages (which is very cool). Unfortunately its almost
> > certain that the grammar targets a different language ...
> > 2. Action code clutters the readability of the grammar - especially
> > if its in a target language that you don't know.
>
>Hello Arnulf,
>
>The normal approach to avoid cluttering the grammar is to just
>have one line of action code that calls a method in in target
>language.

Ok, I know that trick ;-) But do function invocations look the same 
in every target language? I don't think so.

>
> > Because ANTLR changes a lot over time, action code should be embedded
> > into ANTLR directly with "on board" tools.
>
>For a parser for a large language the ANTLR generated parser file
>is already too large (2.5MB in my case) for the Netbeans debugger
>to open when it only has at most one line in each action.  To
>then go and embed the action code would stop me from being able to
>debug it at all.  Even if the debugger could handle it, there is
>no way I want to go searching through megabytes of generated
>parser code looking for the place to set my breakpoint.

Good point. I assume it is no problem to wrap the in-place action 
with a function call, e.g.:

myrule : ID [ DictAdd(ID) ] ;

could translate to

myrule_action1(ID) { dictionary.add(ID); }

and a call of myrule_action1 in the code for rule myrule.
Then you have a change to place a breakpoint there.

> > So why not use these wonderful string templates?
> >
> > Instead of writing
> >
> > { myDict.add($ID.text()); }
> >
> > one could write for instance
> >
> > [ DictAdd(ID) ]
> >
> > which ANTLR could translate on the fly to target language code at
> > that position.
>
>In practice the action code for a compiler for a large
>language is thousands of lines of code just for entering the
>information into a symbol table, which for an object orientated
>language is a DAG (Directed Acyclic Graph), for looking up
>information in the symbol table, etc.  It would be inconvenient
>to develop and debug this using string templates.

Yes, but exactly that is the best argument for providing such a base 
functionality!
I have been a little imprecise here: I do not suggest that everything 
needs to be done with string templates.
Especially a data structure like a dictionary should be part of the 
runtime; members and functions that come with the BaseParser (in its 
respective target language - needless to say)
The string template just triggers that member functions.

Only specialized action code that is not covered by base 
functionality needs to be provided by the grammar designer.

But the question is:
Are concepts like dictionaries, lists, maps etc. really so different 
between different grammars?
Ok, there are scoping rules and the like but in principle they do the same.
I don't know, I never designed such a huge grammar.

But take a look into smaller grammars like CMinus.g
I think it does not take too much to make that target language agnostic.

>
> > Then the writer of the grammar needs to provide a string template
> > group (with a template "DictAdd") which performs the translation to
> > "his" target language.
> > This way targetting a different language amounts to rewriting the
> > string template group.
> > This does not alter the original grammar and will hopefully be
> > posted :-)
> >
> > The target language folks even could provide a minimal toolset for
> > dictionaries and the like.
>
>A dictionary is insufficient for a symbol table for an object
>orientated language.  It would be impractical for the target language
>developers to anticipate the symbol table language requirements for
>evey conceivable language.

ok, thats bad news. My memory of compiler construction lessons fade 
away slowly, but what makes it hard in your opinion to build some 
sort of universal symbol table? the scoping rules? keywords versus names?

> > If there is a good collection of tools,
> > the action code gets structured, documented and well known by the
> > time.
> >
> > What do you think?
>
>In my dreams I wish there was some magic way to automatically translate
>the Java ANTLR runtime and all my Java action code into C++ sometime
>in the future when the C++ runtime is available.

Don't know if that provides any comfort to you but I'm currently 
trying to port the Java runtime to C++. Not very demanding, I just 
rename keywords etc.
As far as I could see there are no Java specific features in there.
String processing and pointers vs. objects are annoying but no show stoppers.

Everytime I replace a java Object with a C++ pointer I get a steadily 
growing feeling that I need to test that sometimes :-)
And realizing that most example grammars use Java as target language 
(and action code) led me to this posting ...

>Back in real world, what I am doing at the moment is I develop all
>of my Java action code modelled in the freeware UML CASE tool called
>BOUML:
>
>http://bouml.free.fr/
>
>Then I hope that some time in the future that some other kind hearted
>masochists (not me, sorry, I am already one level of indirection away
>from real work) will develop a C++ ANTLR runtime including tree wizard,
>and C++ string template.

Ok, seems that I can't count on you :-)

>Thanks, Mark
>
>--

Bye!