[antlr-interest] De-emphasizing tree grammars?

Gavin King gavin.king at gmail.com
Fri Dec 23 20:39:33 PST 2011


On Fri, Dec 23, 2011 at 7:40 PM, Jason Osgood <jason at jasonosgood.com> wrote:

> What's a "typed syntax tree" is. From looking at your code, it's a
> Java class hierarchy representing the parts of your language and you
> build an object graph representing a program. Verse a parse tree, AST,
> or DOM. Using your objects for "nodes" instead of untyped DOM or parse
> nodes.
>
> Right?

Right, I mean where you have a Java class that represents each kind of
node in the tree (i.e. each kind of syntactic construct). So your
compiler is composed of visitors that work with a strongly-typed
representation of the syntax tree.

> Just skimmed your Ceylon.g. I totally forgot about that strategy
> (using ANTLR rules with parameters and return values).
>
> I've done that. I didn't like it. Instead of stitching an object graph
> together with inlined Java, I opted instead to use a Builder. Method
> calls and an internal stack instead of "new" and assignment. Wasn't
> much better.

A Builder might work out a little cleaner, I'm not sure. Truth is it
didn't ever occur to me.  To be honest I think it would probably wind
up about the same.

(I was just basically following the same model that ANTLR uses for its
non-typesafe AST building.)

I'm not trying to hold up my grammar as some kind of model of clean
code. The best I can say is that it works and that it works really
well for non-wellformed input, mainly as a result of me having
painstakingly coded in a whole lot of special cases to improve on
ANTLRs default error recovery, which is often not really good enough
for what you need in an IDE.

> I very much dislike hybrid languages. Stuff like inlining Java in
> one's grammar, C#'s LINQ, template languages. I have a hard enough
> time understanding one language at a time.

Trust me I hate them way more than you ;-) The point I was trying to
make is that I wish that ANTLR could write that tree-building code for
me, instead of me having to handcode a lot of tedious stuff in Java
code embedded in a text file where I don't have any kind of error
reporting or autocompletion or any of the other stuff I'm totally
dependent on for the last ten years...

> Any way.
>
> Next time, for my ARON project, I'll subclass DebugEventListener and
> move all that bookkeeping there. Builders are still complicated, true.
> That's just the nature of Builders.

Hah! It would not have occurred to me to use an interface called
DebugEventListener to implement my tree building logic! Is that
something people do? So it gives you more like a SAX-style callback
API to build your tree? That sounds like it could work out much nicer.

> But at least the Builder and grammar are distinct.

Right, that would be ideal.

I was more thinking along the lines of I wish ANTLR would be able to
build the tree for me, but out of typesafe node classes, and without
the throwing-away-bits-of-the-tree behaviour that caused me so many
problems. But perhaps a SAX-style API would just be a simpler, more
robust solution.


> I DON'T build a "typed syntax tree" in my fado project for a couple
> reasons. My pathetic efforts to build a generic SQL object model were
> easily defeated. What I really needed to do was find and replace the
> interesting bits. By leaving the parse tree in place, I was able to
> reemit the input stream with just the interesting bits changes,
> preserving formatting, comments, etc.

Yes, in my original mail I was sort of trying to distinguish between
something like JPAQL->SQL translation where you're essentially doing a
minimal sort of transformation and a "real" compiler where you're
doing a whole bunch of complex typechecking and then finally some
quite sophisticated transformation at the end, where the output
artifacts don't have a strict 1-1 correspondence to the input
artifacts. This is the kind of thing you want to do completely in Java
land with a typesafe syntax tree.

> The Ceylon and fado use cases are very different. But I think moving
> inlined Java code out of our grammars is a positive step forward.

Definitely.

> Which is why I posted my original question. Debug event listener hooks
> are only generated with a command switch. Terence didn't anticipate
> they'd be used like this.

Ah right. Figures ;-)

> Thinking about it, I should probably mention:
>
> ANTLR's DebugEventListener is kinda like a SAX EventListener.

Haha, I had not read to the end when I wrote the comments further up.

So I think you might be right, and that it might be *much* better for
ANTLR to provide a SAX-style callback API, where we can write Java
code using all the benefits of a Java IDE, than the embedded actions
that we have to use today. That definitely gels with all the
experience that I have had with ANTLR. It would definitely be an
easier API to get right than the thing I was suggesting.

If I would have had an API like this available to me when I was
working on the Ceylon grammar, I think it's definitely what I would
have used.


-- 
Gavin King
gavin.king at gmail.com
http://in.relation.to/Bloggers/Gavin
http://ceylon-lang.org
http://hibernate.org
http://seamframework.org


More information about the antlr-interest mailing list