[antlr-interest] philosophy about translation
Kay Roepke
kroepke at classdump.org
Wed Oct 11 16:54:25 PDT 2006
On 12. Oct 2006, at 1:12 Uhr, Andy Tripp wrote:
Andy Tripp wrote:
> Yea, maybe. But it's one thing to use a tree-like data structure as
> one of many data structures during processing.
> It's quite another to architect the whole thing as a "tree-walking
> approach".
I see what you mean. I don't have the experience to judge whether a
tree-walking approach would actually be feasible in this case (esp.
considering Cobol), though I still think I'd go the tree way. This is
not to say that some sort of rule system wouldn't solve some things
easier. The example you give below still strikes me being
implementable with trees, though. I guess it's just that I haven't
seen an example that is really convoluted with trees.
>> Would you need one rule for each supported type instead of one
>> rule for all non-pointer types and one exception for pointer types?
>
> If things get non-trivial, I can mix this pattern language with
> code. For example, to verify that
> the "v1" in v1 **v2 --> v1 v2[][];" really is a type, I could say
> something like:
>
> class StarChecker extends PatternRule {
> StarChecker() {
> super("v1 ** v2", "v1 v2[][]");
> }
> boolean match(Source source) {
> if (super.match(source)) {
> String v1 = results.get("v1"); // get the text that
> "v1" placeholder matched
> if (source.isType(v1)) { // verify that it's
> really a type (e.g. using symbol table)
> return true;
> }
> }
> return false;
> }
Ok, so a sort of semantic predicates in grammars, only that it is not
in a grammar, but in hand-written code. Of course, it is using some
sort of grammar anyway, because your rule engine still uses a kind of
language.
> Seems pretty trivial to find the variable declaration either way.
Yeah, probably I did too much maths...I always start to think in
terms of metrics measuring some value for things. I was thinking of a
"distance" metric in the tree or token stream as an indicator for
complexity. But as you said, for your application speed is not of
paramount concern, so this is simply not applicable.
> Right. I'm trying to change the dynamics on that and get people to
> believe that it can be done.
> I believe my product does it, but it's still a tough sell. About
> 1/3 of the people who come up to our booth
> at tradeshows are "skeptics". They come up, take a quick look, and
> then ask "how do you handle unions?" or
> "...memset?" or "multiple inheritence." By the time I've started
> explaining about how memset is almost always
> used to initialize a struct to zero, and we check for that sort of
> thing, they walk away. It's how we programmers
> naturally are: we sure want our compilers to work 100%, and it
> seems like a translator should, too. So since
> it's impossible, in theory, we go back and do it by hand (or don't
> do it). It never occurs to us that someone could simply
> automate the stuff that everyone's doing by hand and save everyone
> 90% of the time and effort.
>
> The traveling salesman problem is NP-complete, and yet we have no
> problem using algorithms that are less than
> perfect do the best they can because they're so much better than
> humans. It's a shame we can't seem to take
> the same approach with rewriting code to a new language.
I totally agree with you on that point. Having a tool available is so
much better in any craft. It's hard to understand why some people
cannot see the value of that. I mean having a compiler still doesn't
write programs for you, but it saves you from all the nitty gritty
details you don't want to bother with. Funnily enough, in other areas
they do accept that: see garbage collection for one thing. Nowadays
everyone jumps onto that train. It ain't perfect in many cases, but
it will help you to get your work done sooner. Same applies to your
project. And incidentally the same applies to IDE's, too. At least
that's slowly changing. I don't want to go back to vi to edit my
projects. Too. Much. Hassle.
Kudos for trying to change that!
>> I'm not an expert in linguistics, far from it, so I can't really
>> say anything for NLP.
>
> Yea, me neither. I was pretty shocked at how different the NLP
> approaches are from "compiler" approaches.
> Seemed like zero overlap. I'm still pretty shocked at how bad NLP
> seems to be, but I guess I have just
> one data point: babelfish.
I think the real problem with NLP vs. most artificial languages is:
You do not need to declare objects before usage in natural languages.
We don't go around pre-declaring everything we want to talk or write
about. Also, natural languages are much more ambiguous and highly
context sensitive. Furthermore, the semantics of a word or phrase can
depend highly of intonation if you are dealing with speech. And last
but not least, natural language can grossly violate grammatical rules
and still be understood. This is not generally the case with
artificial languages.
-k
More information about the antlr-interest
mailing list