[antlr-interest] philosophy about translation
Loring Craymer
lgcraymer at yahoo.com
Wed Oct 11 18:00:10 PDT 2006
--- Andy Tripp <antlr at jazillian.com> wrote:
> As for tree structures, the AST shapes are pretty
> arbitrary. Monty's C
> grammar and the two java.g grammars build
> similar trees, but there are differences where the
> only reason for the
> difference is "just because it was designed that
> way".
> With a one dimensional token stream, we don't have
> that problem. There's
> one less level obstraction between what's
> on the screen and the mental picture in my head. To
> this day, I'm not
> sure off the top of my head what the AST
> is for, say, "public static void main(String[] args)
> {". I have trouble
> keeping the C AST, the Java AST, and the mapping
> between the two in my head all at once. And this is
> the most trivial case!
There are simpler cases, but this is a pretty fair
example of the conceptual problem. Visualizing the
tree is usually a distraction; you are a lot closer to
thinking effectively with your "stream of tokens"
model. The point of the tree structure is to simplify
the recognition problem: "down" and "right" can be
used to capture distinctions. ANTLR always does an
inorder traversal of the tree, which translates to a
token sequence. Ter actually formalized this in the
ANTLR 3 TreeNodeStream: "DOWN" and "UP" are generated
tokens.
> And yet, as a sequence of tokens, there's no mental
> work at all. What I
> "see" is the same as my mental image.
> (And what I "see" is a sequence of tokens, not a
> sequence of chars).
>
> >
> > If I were working on a natural language problem,
> would I use trees?
> > Sure! Trees are a very convenient way of
> capturing semantics into
> > syntactic form. Would I generate output with a
> visitor? For
> > Esperanto or one of the Romance languages,
> possibly (for these
> > languages, simple syntactic principles apply); for
> English, that is
> > unlikely--too much "peephole" substitution is
> necessary to handle
> > special cases. Instead, I would most likely
> process the input into a
> > canonical tree with a regular structure
> (converting input to tree form
> > is about transforming the special cases) and do
> some form of
> > rule-based substitution for all of the nasty
> colloquialisms and other
> > special cases. (In compiler parlance, that is
> called "peephole
> > optimization", often an essential for generating
> good code.)
>
> And yet, from what my admitedly feeble research
> found is that NLP
> systems generally don't use a tree.
> If they do, it's not at the heart of the system -
> they are not "tree
> processors".
NLP (neuro-linguistic processing) is done by wetware.
Natural language processing, however, is almost always
done with trees masked by "Attribute Grammars". [That
is post-parsing; GLR and "Corner Parsing" algorithms
are usually used for recognition, and Europe seems to
be the hotbed for published research in the area.]
> I always thought a "compiler" was just a translator
> that happened to
> have an output that's at a lower level than the
> input - typically
> machine code, or byte code. But then, I guess I'm
> just remembering an
> undergrad compiler class from 20 years ago, and the
> dragon book, so I guess I'm no expert.
I was careful to specify "compiler technology" as
distinguished from production compilers. Research
compilers often do some very strange (and wonderful)
things. Also, "lower level" is a subjective
judgement.
>
> > Nothing that you have described doing is out of
> place in a
> > conventional compiler environment. You might find
> it interesting to
> > take a look at GNU RTL, used to generate the
> peephole optimizer for
> > the gcc toolset.
>
> Huh??? That makes no sense to me. I've described
> lots of things that are
> completely out of place in
> a conventional compiler. A conventional compiler
> doesn't translate
> library calls, like "printf" to
> "System.out.println".
> I do stuff like "strcpy(v1, v2) --> v1 = v2",
> replacing a C strcpy()
> call with a Java assignment.
> No conventional compiler does anything like that,
> because technically,
> it's wrong!
Check out memcpy for gcc targeting an x86 for an
example of a compiler-translated library call. As for
the assignment, I see nothing wrong; you have added
the restrictions that all strings are read-only. That
would cause errors if naively translated statements
like "v1[5] = ' ';", but I assume that you do the
right thing.
> I rename files, methods, and variables based on
> user-specified mappings.
> Compilers don't do that.
> I could go on.
Most of the translations are not user-specified, true.
But any C/C++ compiler renames methods and variables
according to user specifications (macros), and
supports file renaming for output.
>
> >
> > As to your "unconventional" approach: I hate to
> say this, but
> > everything that you have described doing is well
> documented in the
> > literature.
>
> Maybe you could point me to something specific.
> Today I'm adding smarts
> to my GotoRemoverRule that generates code that
> is not faster or more compact than what it was
> producing before, but it
> looks better to the human eye. I doubt that
> there's any literature on that.
That sort of thing is usually isolated to "see what I
can do" claims in papers. Compiler-generated
assembler files are usually quite readable (well, for
assembly language) and that dates back to the days
when it was necessary to claim that "FORTRAN-generated
assembler is as good as (or nearly as good as)
programmer-generated code".
>
> > I have noticed that "healthy disrespect for
> conventional approaches"
> > usually translates to "I never check to see if I
> am reinventing the
> > wheel or not". One of the great resources on the
> web is Citeseer:
> > <http://citeseer.ist.psu.edu/>. It has some of
> the good early papers
> > on pattern matching and transformation systems.
>
> I've looked at many, many papers on those. Can you
> point me to anything
> specific that you
> think I'm missing?
No, sorry, you'll have to do your own research. I
never have enough time to do as much as I would like
for myself!
>
> I've also noticed that the person who feels like
> they're "breaking new
> ground" just hasn't done his homework. That's why
> I spend so much time typing here - worried that I
> might be doing that.
> So I'm not just being defensive,
>
This is a user group, not a theory group, so it is not
the ideal forum for this sort of sanity check
(although it is certainly better than none). I do
have to think that dragging Monty out of the woodwork
for this discussion was a significant achievement! I
only jumped in after I thought that sanity had
prevailed to make the point that "it's really not the
methodology that's at issue, it's the level of tool
support".
--Loring
=== message truncated ===
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
More information about the antlr-interest
mailing list