[antlr-interest] Moving from SableCC to ANTLR; What are tree grammars?

Andreas Stefik stefika at gmail.com
Sat Apr 3 20:01:40 PDT 2010


Hi Tyler,

On Sat, Apr 3, 2010 at 8:21 PM, Tyler Distad <tyler.distad at gmail.com> wrote:
> My Computer Science professor has taught his Compiler course with SableCC
> for years. I am investigating moving the class to ANTLR.
>

As a professor that teaches compilers myself on occasion, I would
highly recommend the change.

> Under SableCC, after creating a valid grammar, we used the Visitor pattern
> to perform semantic checking and AT&T assembly generation. To do this, we
> created a SemanticChecker class and a CodeGen class, both extending from the
> SableCC DepthFirstAdapter class. This allowed for a beautiful separation of
> the grammar definition and our processing code. We could also easily make
> multiple passes over our AST.

You can certainly do this sort of thing in ANTLR as well, although
using Visitor is generally unnecessary, as you have tree grammars,
which in my view, are far superior.

>
> I have hunted high and low for an ANTLR-specific method of doing anything
> remotely similar. It has been intimated (
> http://antlr.org/article/1100569809276/use.tree.grammars.tml) that tree
> grammars should let me do what I want, but I must be misunderstanding
> because my implementation is wholly inadequate.
>
> Two questions:
>
>  1) What is the point of a tree grammar? My work so far seems to indicate
> that anything I can do in a "tree grammar" (such as actions, rewrites,
> etc.), I can just as easily do in a "combined grammar."

In non-technical terms, tree grammars iterate over your AST in a way
that is metaphorically similar to traversing over your AST's using
Visitor, except that you don't have to write, and more importantly
maintain, a visitor, and writing a tree grammar is massively easier
(in my opinion). In our case, we use a combined grammar as a first
pass and a modified tree grammar as a later pass. I've written
compilers using visitors exactly like you say, have contributed to
compilers without visitors (and without tree grammars), but the
largely automated approach to tree grammars is a real time saver.

Doing it this way also makes it really easy to carefully control which
actions you toss into your parser and your tree grammar, which is
important when writing multiple pass compilers. Doing this with
visitor is possible, of course, but in visitor classes, I always found
we needed to maintain a ton of state information about what is going
on. With tree grammars, you never have to do that, which simplifies
code significantly. And for the obvious things you do need to pass
around, you can use Scoping features of ANTLR, which is very helpful.

>
>  2) Assuming tree grammars are useful, then when working with them, do I
> *really* have to copy/paste my rule definitions from my combined grammar?
> The simplecTreeParser example in the examples-v3 file on the ANTLR website
> certainly looks that way. I want to just be able to reference my tree
> somewhere...not redefine the whole thing for every single pass.

It's true that, on its face, it sounds like copy pasting rules into a
tree grammar is a bad idea --- but this is not quite correct. First,
you rarely copy paste in all the rules directly, as you usually want
to modify them to make your AST traversal simpler. The most typical
example of this is in expression rules, which are always a bit gross
in a parser, but are rewritten and clean in a tree grammar. We usually
also use tree rewrite rules in a number of places to ease parsing as
well.

But anyway, this only sounds like a bad idea until you try it. In
practice, copy pasting in rules into a new file, with all new,
carefully controllable (if you want), new semantic actions, is
extremely helpful.


In sum, here's a few advantages:

1. Visualization of DFAs and CFGs is extremely helpful, for students
(and myself sometimes, honestly, when I get stuck on a tricky rule)
2. Visualizations include the ability to analyze grammar conflicts ---
also very handy
3. LL star instead of LL k (like JavaCC --- Sable CC is LALR k if I
recall correctly)
4. Semantic and Syntactic predicates make writing very complex
grammars much easier than with traditional parsers
5. Tree grammars more or less get rid of the visitor pattern, which I
strongly prefer. I understand the visitor pattern very, very, well,
and have written compilers with it, but it drives me crazy sometimes.
6. Extremely excellent documentation, especially Terrence's books. I
haven't read the new one yet, but compared to most compiler books on
the market, Terrence's ANTLR book was a real breath of fresh air.

And here's some disadvantages:

1. Some of my students find writing certain rules in an LL grammar
more difficult than LALR, especially expressions.
2. Some of the ANTLR syntax/semantics is pretty non-intuitive for
students. I'm thinking specifically of rewrite rules like ^(NODE
OTHER_NODE), or a few other things. That isn't to say that things are
any easier in SableCC/JavaCC/Lex/Yacc, or any other language, as they
aren't, but still ...

Stefik


>
> Tyler Distad
>
> For reference, below is a snippet of my non-tree-grammar. I do NOT want to
> copy/paste this code into a new tree-grammar definition. I want to be able
> to easily work with it from outside the AST.
>
>    stmt: stmtAsmt
>        | stmtIf
>        | stmtWhile
>        | expr SEMICOLON_CH -> ^(STMT expr)
>    ;
>
>    stmtAsmt
>        : ID ASSIGN_OP expr SEMICOLON_CH
>        -> ^(STMT ID expr)
>        ;
>
>    stmtIf
>        : IF_KW L_PAR_CH expr R_PAR_CH L_BRACE_CH stmt* R_BRACE_CH (ELSE_KW
> L_BRACE_CH stmt* R_BRACE_CH)?
>        -> ^(STMT expr stmt* stmt*)
>        ;
>
>    stmtWhile
>        : WHILE_KW L_PAR_CH expr R_PAR_CH L_BRACE_CH stmt* R_BRACE_CH
>        -> ^(STMT expr stmt*)
>        ;
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>


More information about the antlr-interest mailing list