[antlr-interest] ANTLR AST vs Tree Grammar

Mon Oct 26 04:50:05 PDT 2009

Hi Tommy,

Tree Grammars and ASTs are not competing in the same space. AST is just the
tree of nodes that represent your input. Tree Grammar is a grammar made
specifically to parse those  ASTs.

You usually use them to walk through an AST you already built, and do things
with it. Think about the multiple passes a compiler needs to go through to
generate the final binary. In the first pass, you use the parser and lexer
to understand the basic syntax and generate the AST, next you may need to
simplify some code, remove unnecessary instructions, set the type
information for variables, change the structure, or even output code in
another language.

Basically you may build as many tree grammars you need to do those passes,
and pipe them to produce the desired output.

The tree grammar looks like a parser grammar, but it actually uses node
types instead of tokens to identify the patterns, so they are not exactly
different grammars to parse the same thing.

Hope it helps.

On Sun, Oct 25, 2009 at 5:46 AM, Tommy Chheng <tommy.chheng at gmail.com>wrote:

> I'm using ANTLR to construct a small general purpose compiler. I've used
> JFlex/CUP in the past but ANTLR looks like it can help generate better code
> in a more streamline process. I'm reading the Language Implementation
> Patterns book which has a few ANTLR examples (
> http://pragprog.com/titles/tpdsl/source_code)
>
> I'm a little confused about difference between the Tree Grammar and AST
> output option in ANTLR.
>
> In walking/tree-grammar folder of (
> http://pragprog.com/titles/tpdsl/source_code),  there's a lexer/parser
> grammar (like VecMath.g) with a header:
> options {output=AST;} // we want to create ASTs
>
> This will just produce the parser code(VecMathLexer/VecMathParser), but not
> the actual AST?
> In the folder, i see a bunch of "VarNode, IntNode..". Were these AST
> classes just manually created?
>
> Then if I want to actually produce an AST tree walk output. I would have to
> create something like Printer.g:
> tree grammar Printer; // this grammar is a tree grammar called Printer
> tokenVocab=VecMath;      // use token vocabulary from VecMath.g
> ASTLabelType=CommonTree; // use homogeneous CommonTree for $ID, etc.
>
> In Printer.g, it looks like a DIFFERENT set of parsing rules? Isn't it
> problematic to have 2 sets of grammar parsing rules?
> Printer.g:
> expr:   ^('+' expr {print("+");} expr)
>     |   ^('*' expr {print("*");} expr)
> vs
> VecMath.g:
> expr:    multExpr ('+'^ multExpr)* ;     // '+' is root node
> multExpr
>     :   primary (('*'^|'.'^) primary)*  // '*', '.' are roots
>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20091026/81cc0158/attachment.html