[antlr-interest] postmortem

Thu Mar 13 12:54:27 PDT 2008

Thomas Brandon wrote:
> On Fri, Mar 14, 2008 at 2:00 AM, Andy Tripp <antlr at jazillian.com> wrote:
>   
>>> Thomas Brandon wrote:
>>>       
>>  On Thu, Mar 13, 2008 at 7:58 AM, Andy Tripp <antlr at jazillian.com> wrote:
>>     
>>>  Yes, most ANTLR users want a non-flat tree. But most (if not all) of
>>> these users do not want a parse tree.
>>>       
>>  Agreed. But keep in mind that for many applications, simply walking the AST
>> by hand is enough.
>>     
> I'm familiar with your views so I tried to though it's a rather alien
> approach to me so forgive any oversights..
>   
>>>  The parse trees generated by
>>> ANTLR are not just like ASTs and cannot be used as such. Parse trees
>>> consist of standard AST nodes for all the actual language nodes and
>>> special parse tree nodes (of type ParseTree extending CommonTree) for
>>> the rule references. These parse tree nodes have a token type of 0.
>>> Thus you cannot use a tree parser against a parse tree and manually
>>> walking the tree would be complicated. Parse trees (as generated by
>>> ANTLR) are more of a debugging aid than a type of AST that
>>> automatically adds structure.
>>>       
>>  ...I think that in many cases processing the parse tree would be very
>> simple:
>>
>>  if (!(ast instanceof ParseTree)) {
>>     doSomethingWith(ast);
>>  } else {
>>     // ignore ParseTree nodes
>>  }
>>
>>     
> Well, that's not going to process the children of parse trees which is
> presumably not what you want. And if you're skipping them then what
> use are they exactly?
>   
Sorry, I left out and assumed everyone would mentally fill in that the 
children might always get processed:
for (child: ast.getChildren()) {
    walk(child);   // recursive call to process children
}
> But my point was that with no token type you can switch on token
> types. So you'd have to switch on the string name, which gets rather
> nasty to maintain, and means changing a rule name will break your tree
> walker but give no compile time errors. Not especially easy or
> desirable.
>   
Uhhh...yea, OK. Sorry, I'm not familiar with ParseTree. Couldn't we use 
"instanceof" then...switch
on the class of the node?
> <SNIP>A fair bit of going round in circles with proposed heuristics</SNIP>
>
>   
>>  ...but I guess maybe that puts us where we started - generating a full
>> parse tree.
>>  But even without solving this problem - go ahead and leave the tree as
>> being hard to walk -
>>  we're still ahead. The goal here is not to generate a "good" AST, but
>> rather just produce something
>>  that's better than nothing (i.e. a flat tree). So the newbie gets this
>> tree, sees that it's hard to
>>  distinguish the various cases where he's got an ID node, and starts reading
>> up on how to build
>>  an AST. He's better off here than the alternative: a blank stare at a flat
>> ast.
>>     
> <SNIP>more circles</SNIP>
>   
>>  Thanks for writing this up. Let me know if you think the replies I gave
>> would work.
>>  As I said, if they work, but only make things somewhat better, but not a
>> completely usable AST, that's
>>  still a win IMO. If they don't work, then how about simply say "if there
>> are no ^'s, I'll just use
>>  the existing code to get a parse tree and return that"?
>>
>>  Andy
>>     
> (Can you please stop HTML posting or at least use a client with decent
> quoting in it's HTML. Gmail can't handle your quoting and I've found
> it's generally pretty good at that.)
>   
Yes, sorry about that.
> I could go through your various proposed rules and point out the case
> where they produce a structure I can't see anyone wanting but I gather
> you admit that point.
>   
Yes. Just trying to have an default AST who's usefulness is greater than 
0, not necessarily 100%
or close to it.
> The detect ^ and switch off automatic rules doesn't work because the
> user may want a flat AST.
>   
Uh Oh. See my exchange with Jim on that.
Please explain a (non-contrived) case where someone may want a flat AST.
> And I don't accept your view that producing a bad AST is better than
> producing a flat AST. Is anyone going to use a bad AST? 
I think so. If nothing else, just to verify that the parser is doing 
something reasonable.

When it came time for me to start using C++ ASTs for expressions, I 
looked into it and decided
I could just use a C parser and C ASTs, after all, expressions are the 
same in C and C++, right?
Just this week, over a year later, I finally hit a case where a 
particular C++ expression isn't valid in C.

The point is that I was (mostly) fine using a subset of a C grammar for C++.
Now that I find that the C grammar is "bad" for C++, what did  I do? I found
a way to easy get around the problem by eliminating the invalid syntax from
the input. That may seem like a hack (the purist would have used a real 
C++ grammar), but it
does work and was far easier than switching grammars. My job is to get 
my application working
well as quickly as possible, not to learn any more of ANTLR than I need to.
> I can't see
> why they would and don't think they should. Is it easy to allow such
> auto-generation to be mixed with explicit construction to generate a
> good AST? I don't think so. 
As I said in another post, I'd simply say "if there are no explicit 
AST-building constructs in
the grammar, return the parse tree". No mixing.
> At best you end up with a grammar that's
> messier and harder to understand than if you'd just used manual
> construction. 
No mixing. The grammar is whatever it is, no different in any way.
If it has no ^ or ! characters, you get a parse tree.
> So you've spent all the time developing a system 
"All the time" being about 20 lines of code: declare a flag, set the 
flag whenever you see
^ or !. In getTree(), if the flag is not set, return the parse tree instead.
> that no
> one is ever going to use for more than the first grammar run that's
> only purpose is to avoid giving someone a flat AST. 
Yes, avoiding giving a flat AST is one purpose. I've said why that's 
good, and a couple
of people agreed (including the original poster, who bothered enough to 
mention it).

I also think even a "bad AST" would be useful. For one, to verify that 
the parser is working
as intended. Also, I think some applications would actually be able to 
just use the "bad AST".
The guy who isn't even using an AST today might use it, if he gets it 
"for free". The guys
who wrote that tool to automatically generate it might use it. And, of 
course, a
parse tree really is useful (e.g. ANTLRWorks uses it), it's just a 
matter of whether
it's provided as "the default" when you've asked for an AST but provided 
no "^".
> And if they don't
> look at enough docs\examples first and, heaven forbid, get a flat AST
> they either go to the docs and quickly recognise their flawed
> expectations of what output=AST will do or send a message to the list
> that requires all of two or three sentences to answer. 
By that time, it's "too late". From the original post, you can see what 
happens.
After going through the long process of building a tree-like grammar, 
and having it successfully
parse some input, people realistically expect to be able to see some tree.

Suppose you say "i = 1;" without declaring i, in Java or any other 
statically typed language.
And suppose that that reference is in "dead code" that's never reached. 
The compiler could
silently ignore the problem, and the output bytecode (or executable or 
whatever) is perfectly
valid. But the user if left saying "I wonder why the output isn't what I 
expected?" and start
to dislike the tool.

The same is true here. This guy put "output=AST" and got out something 
unexpected.
Sure, he can search the book, the wiki, ask on the mailing list, and 
find the answer.
Just as the guy who forgot to define his variable could do all those 
things to find the answer.

Making sure that the answer is available is one thing, avoiding the 
confusion in the first place is
another. At a minimum, don't let me put "output=AST" and then forget to 
put any ^'s in
my grammar. There's no need to let me make that mistake. It's a 
usability issue.
> How is this in
> anyway a good use of time or a justifiable addition of code
> complexity?
>   
It would have helped the original poster, who seems so mad he won't be 
using ANTLR any more.
Obviously, it's not just this issue. In all these posts, I'm arguing for 
more usability work
overall, not just this specific AST thing.

I realize these "make it easier for the newbie"
suggestions make little sense to Terence and most others, and Terence 
has no time for stuff
like this. But usability does matter. I suspect for each frustrated guy 
like this who posted,
there are at least several more who didn't bother.

By analogy, all the C compilers have lots of documentation and mailing 
lists
and lots of users, and you could probably spend a lifetime with the gcc 
documentation alone.
And yet their error messages all suck compared to javac's messages. I 
don't know
exactly how Sun managed that, but I do know there's something 
fundamentally better
about javac.

Andy
> Tom.
>
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080313/ab7f0d7c/attachment-0001.html