[antlr-interest] philosophy about translation

Andy Tripp antlr at jazillian.com
Wed Oct 11 11:01:36 PDT 2006


Loring Craymer wrote:

> Andy--
>
> I deliberately chose your message to respond to because it captured 
> the pragmatic viewpoint--"I had a problem; I solved it; I did not need 
> to use trees"--quite effectively.  That is quite a bit different from 
> the more usual "I tried to use ANTLR's tree facilities and discovered 
> that writing them by hand and using a visitor is easy and clearly is 
> the ONE TRUE SOLUTION to language processing" that appears in the 
> group every so often.  (Although, truth be told, some of your early 
> messages to the group were along that vein, but you have since worked 
> on problems that require multi-pass recognition support and have 
> broadened your horizons.)
>
> "Thinking in trees" does not come automatically.  It is like learning 
> LISP or Forth or one of the functional languages (and, for that 
> matter, object-oriented programming:  there is a lot of badly designed 
> and implemented C++ code out there).  For a time, working with trees 
> is like slogging through molasses, and then you get the "Aha!" 
> experience and things become easy.  It usually is not about designing 
> the perfect tree structure; instead, it is about simplifying the 
> recognition problem and expressing target language constructs in tree 
> form.

For the record, I had no trouble "getting" LISP when I learned it 25 
years ago. When I started with C++, I don't think I
really "got" OOD, and only started writing real OO code when learning 
Java forced me to. I think the fact that LISP never
became "mainstream" means that it failed to be easy enough to grasp. 
Regardless of how inherently beautiful it is,
if a lot of programmers don't easily "get it", then it's not that great.

As for tree structures, the AST shapes are pretty arbitrary. Monty's C 
grammar and the two java.g grammars build
similar trees, but there are differences where the only reason for the 
difference is "just because it was designed that way".
With a one dimensional token stream, we don't have that problem. There's 
one less level obstraction between what's
on the screen and the mental picture in my head. To this day, I'm not 
sure off the top of my head what the AST
is for, say, "public static void main(String[] args) {". I have trouble 
keeping the C AST, the Java AST, and the mapping
between the two in my head all at once. And this is the most trivial case!
And yet, as a sequence of tokens, there's no mental work at all. What I 
"see" is the same as my mental image.
(And what I "see" is a sequence of tokens, not a sequence of chars).

>
> If I were working on a natural language problem, would I use trees?  
> Sure!  Trees are a very convenient way of capturing semantics into 
> syntactic form.  Would I generate output with a visitor?  For 
> Esperanto or one of the Romance languages, possibly (for these 
> languages, simple syntactic principles apply); for English, that is 
> unlikely--too much "peephole" substitution is necessary to handle 
> special cases.  Instead, I would most likely process the input into a 
> canonical tree with a regular structure (converting input to tree form 
> is about transforming the special cases) and do some form of 
> rule-based substitution for all of the nasty colloquialisms and other 
> special cases.  (In compiler parlance, that is called "peephole 
> optimization", often an essential for generating good code.)

And yet, from what my admitedly feeble research found is that NLP 
systems generally don't use a tree.
If they do, it's not at the heart of the system - they are not "tree 
processors".

>
> As to compilers and compiler technology:  you might be surprised.  
> "Compiler technology" and "language translation technology" are 
> effectively synonyms, and the technical term for the translators you 
> have discussed in this group is "cascading compiler" (or used to be; 
> the term has pretty much disappeared from the literature as compiler 
> technology research has declined in popularity).  Most language 
> translation problems are much simpler than a conventional compiler and 
> are solved using a subset of the algorithms that go into building an 
> optimizing compiler.  

I always thought a "compiler" was just a translator that happened to 
have an output that's at a lower level than the input - typically
machine code, or byte code. But then, I guess I'm just remembering an 
undergrad compiler class from 20 years ago, and the
dragon book, so I guess I'm no expert.

> Nothing that you have described doing is out of place in a 
> conventional compiler environment.  You might find it interesting to 
> take a look at GNU RTL, used to generate the peephole optimizer for 
> the gcc toolset.

Huh??? That makes no sense to me. I've described lots of things that are 
completely out of place in
a conventional compiler. A conventional compiler doesn't translate 
library calls, like "printf" to "System.out.println".
I do stuff like "strcpy(v1, v2) --> v1 = v2", replacing a C strcpy() 
call with a Java assignment.
No conventional compiler does anything like that, because technically, 
it's wrong!
I rename files, methods, and variables based on user-specified mappings. 
Compilers don't do that.
I could go on.

>
> As to your "unconventional" approach:  I hate to say this, but 
> everything that you have described doing is well documented in the 
> literature.

Maybe you could point me to something specific. Today I'm adding smarts 
to my GotoRemoverRule that generates code that
is not faster or more compact than what it was producing before, but it 
looks better to the human eye. I doubt that
there's any literature on that.

> I have noticed that "healthy disrespect for conventional approaches" 
> usually translates to "I never check to see if I am reinventing the 
> wheel or not".  One of the great resources on the web is Citeseer:  
> <http://citeseer.ist.psu.edu/>.  It has some of the good early papers 
> on pattern matching and transformation systems.

I've looked at many, many papers on those. Can you point me to anything 
specific that you
think I'm missing?

I've also noticed that the person who feels like they're "breaking new 
ground" just hasn't done his homework. That's why
I spend so much time typing here - worried that I might be doing that. 
So I'm not just being defensive,
I really am looking for pointers.

Thanks,
Andy



More information about the antlr-interest mailing list