[antlr-interest] philosophy about translation

Wed Oct 11 18:00:10 PDT 2006

--- Andy Tripp <antlr at jazillian.com> wrote:
> As for tree structures, the AST shapes are pretty
> arbitrary. Monty's C 
> grammar and the two java.g grammars build
> similar trees, but there are differences where the
> only reason for the 
> difference is "just because it was designed that
> way".
> With a one dimensional token stream, we don't have
> that problem. There's 
> one less level obstraction between what's
> on the screen and the mental picture in my head. To
> this day, I'm not 
> sure off the top of my head what the AST
> is for, say, "public static void main(String[] args)
> {". I have trouble 
> keeping the C AST, the Java AST, and the mapping
> between the two in my head all at once. And this is
> the most trivial case!

There are simpler cases, but this is a pretty fair
example of the conceptual problem.  Visualizing the
tree is usually a distraction; you are a lot closer to
thinking effectively with your "stream of tokens"
model.  The point of the tree structure is to simplify
the recognition problem:  "down" and "right" can be
used to capture distinctions.  ANTLR always does an
inorder traversal of the tree, which translates to a
token sequence.  Ter actually formalized this in the
ANTLR 3 TreeNodeStream:  "DOWN" and "UP" are generated
tokens.

> And yet, as a sequence of tokens, there's no mental
> work at all. What I 
> "see" is the same as my mental image.
> (And what I "see" is a sequence of tokens, not a
> sequence of chars).
> 
> >
> > If I were working on a natural language problem,
> would I use trees?  
> > Sure!  Trees are a very convenient way of
> capturing semantics into 
> > syntactic form.  Would I generate output with a
> visitor?  For 
> > Esperanto or one of the Romance languages,
> possibly (for these 
> > languages, simple syntactic principles apply); for
> English, that is 
> > unlikely--too much "peephole" substitution is
> necessary to handle 
> > special cases.  Instead, I would most likely
> process the input into a 
> > canonical tree with a regular structure
> (converting input to tree form 
> > is about transforming the special cases) and do
> some form of 
> > rule-based substitution for all of the nasty
> colloquialisms and other 
> > special cases.  (In compiler parlance, that is
> called "peephole 
> > optimization", often an essential for generating
> good code.)
> 
> And yet, from what my admitedly feeble research
> found is that NLP 
> systems generally don't use a tree.
> If they do, it's not at the heart of the system -
> they are not "tree 
> processors".

NLP (neuro-linguistic processing) is done by wetware. 
Natural language processing, however, is almost always
done with trees masked by "Attribute Grammars".  [That
is post-parsing; GLR and "Corner Parsing" algorithms
are usually used for recognition, and Europe seems to
be the hotbed for published research in the area.]

> I always thought a "compiler" was just a translator
> that happened to 
> have an output that's at a lower level than the
> input - typically
> machine code, or byte code. But then, I guess I'm
> just remembering an 
> undergrad compiler class from 20 years ago, and the
> dragon book, so I guess I'm no expert.

I was careful to specify "compiler technology" as
distinguished from production compilers.  Research
compilers often do some very strange (and wonderful)
things.  Also, "lower level" is a subjective
judgement.

> 
> > Nothing that you have described doing is out of
> place in a 
> > conventional compiler environment.  You might find
> it interesting to 
> > take a look at GNU RTL, used to generate the
> peephole optimizer for 
> > the gcc toolset.
> 
> Huh??? That makes no sense to me. I've described
> lots of things that are 
> completely out of place in
> a conventional compiler. A conventional compiler
> doesn't translate 
> library calls, like "printf" to
> "System.out.println".
> I do stuff like "strcpy(v1, v2) --> v1 = v2",
> replacing a C strcpy() 
> call with a Java assignment.
> No conventional compiler does anything like that,
> because technically, 
> it's wrong!

Check out memcpy for gcc targeting an x86 for an
example of a compiler-translated library call.  As for
the assignment, I see nothing wrong; you have added
the restrictions that all strings are read-only.  That
would cause errors if naively translated statements
like "v1[5] = ' ';", but I assume that you do the
right thing.

> I rename files, methods, and variables based on
> user-specified mappings. 
> Compilers don't do that.
> I could go on.

Most of the translations are not user-specified, true.
 But any C/C++ compiler renames methods and variables
according to user specifications (macros), and
supports file renaming for output.  

> 
> >
> > As to your "unconventional" approach:  I hate to
> say this, but 
> > everything that you have described doing is well
> documented in the 
> > literature.
> 
> Maybe you could point me to something specific.
> Today I'm adding smarts 
> to my GotoRemoverRule that generates code that
> is not faster or more compact than what it was
> producing before, but it 
> looks better to the human eye. I doubt that
> there's any literature on that.

That sort of thing is usually isolated to "see what I
can do" claims in papers.  Compiler-generated
assembler files are usually quite readable (well, for
assembly language) and that dates back to the days
when it was necessary to claim that "FORTRAN-generated
assembler is as good as (or nearly as good as)
programmer-generated code".

> 
> > I have noticed that "healthy disrespect for
> conventional approaches" 
> > usually translates to "I never check to see if I
> am reinventing the 
> > wheel or not".  One of the great resources on the
> web is Citeseer:  
> > <http://citeseer.ist.psu.edu/>.  It has some of
> the good early papers 
> > on pattern matching and transformation systems.
> 
> I've looked at many, many papers on those. Can you
> point me to anything 
> specific that you
> think I'm missing?

No, sorry, you'll have to do your own research.  I
never have enough time to do as much as I would like
for myself!

> 
> I've also noticed that the person who feels like
> they're "breaking new 
> ground" just hasn't done his homework. That's why
> I spend so much time typing here - worried that I
> might be doing that. 
> So I'm not just being defensive,
> 

This is a user group, not a theory group, so it is not
the ideal forum for this sort of sanity check
(although it is certainly better than none).  I do
have to think that dragging Monty out of  the woodwork
for this discussion was a significant achievement! I
only jumped in after I thought that sanity had
prevailed to make the point that "it's really not the
methodology that's at issue, it's the level of tool
support".

--Loring
=== message truncated ===

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com