[antlr-interest] philosophy about translation

Wed Oct 11 16:54:25 PDT 2006

On 12. Oct 2006, at 1:12 Uhr, Andy Tripp wrote:

Andy Tripp wrote:

> Yea, maybe. But it's one thing to use a tree-like data structure as  
> one of many data structures during processing.
> It's quite another to architect the whole thing as a "tree-walking  
> approach".

I see what you mean. I don't have the experience to judge whether a  
tree-walking approach would actually be feasible in this case (esp.  
considering Cobol), though I still think I'd go the tree way. This is  
not to say that some sort of rule system wouldn't solve some things  
easier. The example you give below still strikes me being  
implementable with trees, though. I guess it's just that I haven't  
seen an example that is really convoluted with trees.

>> Would you need one rule for each supported type instead of one  
>> rule  for all non-pointer types and one exception for pointer types?
>
> If things get non-trivial, I can mix this pattern language with  
> code. For example, to verify that
> the "v1" in v1 **v2 --> v1 v2[][];" really is a type, I could say  
> something like:
>
> class StarChecker extends PatternRule {
>    StarChecker() {
>         super("v1 ** v2", "v1 v2[][]");
>   }
>   boolean match(Source source) {
>        if (super.match(source)) {
>            String v1 = results.get("v1");    // get the text that  
> "v1" placeholder matched
>            if (source.isType(v1)) {           // verify that it's  
> really a type (e.g. using symbol table)
>                return true;
>            }
>         }
>         return false;
> }

Ok, so a sort of semantic predicates in grammars, only that it is not  
in a grammar, but in hand-written code. Of course, it is using some  
sort of grammar anyway, because your rule engine still uses a kind of  
language.

> Seems pretty trivial to find the variable declaration either way.

Yeah, probably I did too much maths...I always start to think in  
terms of metrics measuring some value for things. I was thinking of a  
"distance" metric in the tree or token stream as an indicator for  
complexity. But as you said, for your application speed is not of  
paramount concern, so this is simply not applicable.

> Right. I'm trying to change the dynamics on that and get people to  
> believe that it can be done.
> I believe my product does it, but it's still a tough sell. About  
> 1/3 of the people who come up to our booth
> at tradeshows are "skeptics". They come up, take a quick look, and  
> then ask "how do you handle unions?" or
> "...memset?" or "multiple inheritence." By the time I've started  
> explaining about how memset is almost always
> used to initialize a struct to zero, and we check for that sort of  
> thing, they walk away. It's how we programmers
> naturally are: we sure want our compilers to work 100%, and it  
> seems like a translator should, too. So since
> it's impossible, in theory, we go back and do it by hand (or don't  
> do it). It never occurs to us that someone could simply
> automate the stuff that everyone's doing by hand and save everyone  
> 90% of the time and effort.
>
> The traveling salesman problem is NP-complete, and yet we have no  
> problem using algorithms that are less than
> perfect do the best they can because they're so much better than  
> humans. It's a shame we can't seem to take
> the same approach with rewriting code to a new language.

I totally agree with you on that point. Having a tool available is so  
much better in any craft. It's hard to understand why some people  
cannot see the value of that. I mean having a compiler still doesn't  
write programs for you, but it saves you from all the nitty gritty  
details you don't want to bother with. Funnily enough, in other areas  
they do accept that: see garbage collection for one thing. Nowadays  
everyone jumps onto that train. It ain't perfect in many cases, but  
it will help you to get your work done sooner. Same applies to your  
project. And incidentally the same applies to IDE's, too. At least  
that's slowly changing. I don't want to go back to vi to edit my  
projects. Too. Much. Hassle.

Kudos for trying to change that!

>> I'm not an expert in linguistics, far from it, so I can't really  
>> say  anything for NLP.
>
> Yea, me neither. I was pretty shocked at how different the NLP  
> approaches are from "compiler" approaches.
> Seemed like zero overlap. I'm still pretty shocked at how bad NLP  
> seems to be, but I guess I have just
> one data point: babelfish.

I think the real problem with NLP vs. most artificial languages is:  
You do not need to declare objects before usage in natural languages.  
We don't go around pre-declaring everything we want to talk or write  
about. Also, natural languages are much more ambiguous and highly  
context sensitive. Furthermore, the semantics of a word or phrase can  
depend highly of intonation if you are dealing with speech. And last  
but not least, natural language can grossly violate grammatical rules  
and still be understood. This is not generally the case with  
artificial languages.

-k