[antlr-interest] Re: ITLS (was: Translators Should Use Tree Grammars)

Sun Nov 21 13:57:22 PST 2004

John,
Interesting read.
I don't know if my reply has anything to do with what you said,
but here goes anyway :)

When deciding on an approach (meaning a programming language,
tools, etc) to a programming problem, I first try to describe some
examples in normal english. So, with a C-to-Java translator,
I start saying "I'm going to have lots of specifications like the
following":

"Replace any C main() functions with equivalent Java main() functions".

I then try to describe an algorithm, also in English:

"Find any main() function that return 'int' (or no return type
is specified) and take two arguments - an 'int' and an
array of char pointers. Replace it with this signature:
'public static void main(String[] args)', and replace all references
to 'argc' with args.length and arg[] with args[]. Also,
deal with the off-by-one problem with args[]."

Then I look for the language or tool that lets me specify
my algorithm in a way that most closely matches my English
description.

For lexers and parsers, a BNF-like spec best matches
the English description of the problem (lexer: "A Token
is one of the following sequences of characters", for a parser,
"A package definition is the token 'package' followed by
an ID token, followed by zero or more "."-and-ID-token, followed
by a ';'").

Now sometimes a whole different way of approaching the problem
can work out in the end. For example, you might end up replacing
a lot of C++/Java code with a
a state machine or a set of "business rules" or whatever.
But requiring the application developer to change his mindset
in such a major way is a major burden. It must be extremely
clear at the end that there's a large benefit, as the
developer will constantly be struggling to re-phrase the
'natural' algorithm that first pops into his head, into the
new mechanism.

With lexers and parsers, the natural way to think about the
problem is clearly specified with a BNF-like grammar. But
with a complex AST-to-AST transformation, the natural thought
process is to have a sequence of 'rules' like:
"here are the steps to change the main() function".
This is much more intuitive then a set of actions embedded
in the BNF-like grammar, as the thought process there is
"Here are the things that I should do to translate 
a C FUNCTION_DEF to a Java one."

I have a rule to change a main() function, another to change
a printf() function, and dozens (or hundreds) of others.
These rules don't have a lot in common with each other, and 
I'd hate to have an action for each one at the FUNCTION_DEF
node in a grammar. I'd also hate to have a single
handleAllFunctionDefinitions() method that's called at each
function declaration. That's just not a natural way to think
about the problem.

As an aside, I guess I'm just a non-conformist.
I know full well that the "right" way to do a translator
is to parse into an AST and then do AST-to-AST conversion.
And I tried that for a few months and then abandoned it.
I'm just not smart enough to take a snippet of code,
parse it into an AST in my head, figure out what the "output"
AST should look like in my head, and write the code to
change it. Instead, I just want to write:
int main(int argc, char *argv[]) { --> public static void main(String
args) {

And so that's what I did. I started over, removed all the AST stuff,
and spent the next two years working
with token streams directly, building rules like that one.
It's horribly inefficient and error prone, but I do spend
95% of my time thinking about what the rules should be.
With the AST stuff, I spent probably 80% of my time figuring out
how to implement the rules and only 20% on what the rules should
be.

And please don't dismiss me as "some lunatic that doesn't
even believe in symbol tables", I'm just not believing in them
for certain applications. And the performance requirements for
a one-time source-to-source translator are completely different
from those for a compiler or interpreter.

Andy

Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/