[antlr-interest] Fwd: De-emphasizing tree grammars?

Thu Dec 22 21:01:47 PST 2011

Ooops, I meant to send this to the list rather than to Terence
privately. Been happening to me a lot recently since gmail changed
their UI. Grrr.

---------- Forwarded message ----------
From: Gavin King <gavin.king at gmail.com>
Date: Tue, Dec 20, 2011 at 3:51 PM
Subject: Re: [antlr-interest] De-emphasizing tree grammars?
To: Terence Parr <parrt at cs.usfca.edu>

On Tue, Dec 20, 2011 at 3:25 PM, Terence Parr <parrt at cs.usfca.edu> wrote:

> The AST/ tree grammar mechanism also is something I've been pushing for 20 years
> without success.  I started out in the compilers world where you really do need an AST,
> to avoid all of the noise of rule names inside your tree.

So here's a little bit of feedback on that. I've used ANTLR very
seriously on two projects now, first for the JPA-QL to SQL translation
of the Hibernate ORM library, and now on the compiler and IDE for the
Ceylon programming language.

For the query translator in Hibernate, we used a 3-stage translator
(parser/tree builder, followed by tree transformer, followed by
renderer) where the second two grammars were tree grammars. This
worked out somewhat well, though debugging was quite difficult and
managing less 1-to-1 type transformations was very hard. We were
successful, but I was left doubting that tree grammars had really made
our life easier.

For the Ceylon compiler, I did not even try to use a tree grammar,
because for building a real compiler you want a proper typesafe syntax
tree, not a bunch of untyped tuples. Java isn't lisp, and if you're
trying to write lisp in Java you're unlikely to have much success ;-)
I did initially try letting ANTLR build its untyped tree for
subsequent translation to a typed syntax tree, but this approach was a
failure for two reasons:

1. it's an additional unnecessary step in the pipeline, and, much more
importantly,
2. ANTLR has an awful habit of just throwing away whole branches of
its syntax tree when some rule fails to match far down the tree.

In a real compiler, you *never* want to throw away things the
programmer typed in. That's especially true if, as in our case, the
compiler is also going to be the typechecker of your IDE. IDEs can
simply never, ever throw away input, no matter how many syntax errors
it has. Indeed, the hardest "bit" of building an IDE is doing stuff
like auto-completion correctly while the user is typing, when you're
almost guaranteed that the text is not well-formed.

So basically my conclusion after all this is that ANTLRs tree stuff is
pretty much a misfeature. It's cute for simpler problems, but once you
get beyond hello world, you're better off just writing your own syntax
tree using Java. Which is what I've done in Ceylon.

Now, if ANTLR was able to produce a *typesafe* syntax tree, and not
keep throwing bits of it away all the time, then that would be
something really useful...

--
Gavin King
gavin.king at gmail.com
http://in.relation.to/Bloggers/Gavin
http://ceylon-lang.org
http://hibernate.org
http://seamframework.org

-- 
Gavin King
gavin.king at gmail.com
http://in.relation.to/Bloggers/Gavin
http://ceylon-lang.org
http://hibernate.org
http://seamframework.org