[antlr-interest] philosophy about translation

Mon Oct 9 01:52:19 PDT 2006

This looks like a good place to jump in.  My experience is that language processing with involves:
1.)  Recognition
2.)  Tree construction
3.)  Constructing data structures for analysis
4.)  Analysis
5.)  Post-analysis tree restructuring
6.)  Output

Steps 3 through 6 are often repeated in complex language processing problems; some steps may be skipped, depending on the problem addressed.  Not all language processors have an output stage--reading config files is an example--and many analyses can be done without special data structures.  A full optimizing compiler will go through all of the steps (including multiple analyses), but we have all written processors that do only steps 1 and 6.

Tree restructuring, in my experience, often makes analysis easier; it is also helpful (and possibly critical) when dealing with languages that deal with very different problem domains.  Yet the recurring discussion in this news group goes something like "real men build trees in actions" (actually, more like "writing action code to build trees is easy (you wimps)") and "no problem needs tree walkers--visitors are more than sufficient".  I came to the conclusion some time ago that the real messages are that "ANTLR 2 rewrite support is inadequate" and "writing tree grammars is painful"--both valid points.

A while back I put together a version of ANTLR with native tree rewrite support and automatic tree grammar generation (I needed a paper for a conference I wanted to go to, and Ter was going to be coauthor--unfortunately, I did not get the work done in time to go the the conference) and released it as ANTLR 2.8.  It did not get widespread use--the licensing terms agreed to by JPL did not sit well with the community--but I have gotten used to rapid turnaround on tree restructuring and tree grammar generation.  That makes a very big difference in how I perceive the cost of working with restructured trees--minimal--and in my willingness to do multi-pass transformations.  I used to find writing tree grammars painfully slow; now I find the problem to be refactoring generated grammars and propagating changes to refactored grammars so mostly I do that by hand.  I need a new tool for the next step, but can live with refactoring and propagating by hand for the moment.  If you are
 working without automatic tree grammar generation, though, demand better:  you are being crippled by a lack of proper tools.

One of the problems I had with ANTLR 2.8 was a lack of an attribute syntax.  If I changed a token type manually, I had to edit the generated grammar to match.  That was annoying; also, I could see that having some sort of syntax to structure data would be very helpful in building data structures for analysis passes.  Then, too, I often find it awkward for grammars to be target-dependent; very often, a grammar that was written for a Java app would be very useful for a C++ app--except for the cost of editing all of the actions and then having the two grammars diverge in the syntax that they recognize (another instance of the maintenance problem).  Then there was the annoyance of writing print statements in actions for output--very inelegant.  Fortunately, Ter solved that problem with StringTemplate.  The others--except for refactoring--are addressed by Yggdrasil (or will be--it will still be a while before I have tree grammar generation properly supported).

The take home message?  It makes a lot more sense to improve the tools than to develop a warped perspective because of their current shortcomings.

--Loring

Andy Tripp <antlr at jazillian.com> wrote: Sohail Somani wrote:

>On Sat, 2006-10-07 at 13:40 -0400, Andy Tripp wrote:
>  
>
>>The getContainingFunction() method is real. 
>>It looks backwards
>>(while balancing matching braces) for something that looks like a 
>>function declaration: a "{"
>>preceeded by a ")", with a matching "(" that's preceeded by an ID.
>>
>>Crazy, I know :)
>>    
>>
>
>Yikes! For me its something like:
>
>functionDefn : 
>{setCurrentFunction(function_name);}
>
>Why you need to parse the thing umpteen times, I don't know, but you
>might have a valid reason!
>  
>
I have a function that tells me which function I'm in, rather than 
setting a variable
while walking the code, because...

a) I have hundreds of "rules"/"phases", and only a couple need to know 
what function I'm in.
Given that it's not trivial to know when I'm at a function declaration 
(because I'm "walking"
token streams rather than ASTs), it would be a huge waste to keep track 
of that. Basically,
for each token in each file, I'd be checking to figure out if it's a 
function declaration.

b) It's not actually clear, in COBOL, what a function *is*. There are 
paragraphs, which
typically map to a function, but there can also be "stray code" at the 
top of a file that's
not in a paragraph but needs to be in a function.

c) I have a feeling there might be a problem if I move code around. I 
can't think of a specific
example right now, but that's my general thinking for avoiding symbol 
table use if I can - better
to have a single data structure (in my case a token stream) rather than 
two (a token stream
and a symbol table) that need to be kept in sync.

But I agree with your general point: if you really often need to "know 
where you are" then
an AST helps alot. I've found that I rarely need to know "where I am" in 
the source.

Andy

>Sohail
>
>  
>

---------------------------------
Do you Yahoo!?
 Get on board. You're invited to try the new Yahoo! Mail.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20061009/09989f7f/attachment-0001.html