[antlr-interest] grammars, tree grammars and parse trees, Oh my!

Tue Mar 10 11:54:58 PDT 2009

At 01:40 11/03/2009, Naim Kingston wrote:
>I'm not sure if our (mine and yours) definitions of a parse tree 
>are the same, but with boost::spirit in C++, I've been 
>implementing a small scripting language, and I've gotten very 
>used to having a tree that I can traverse, with each node 
>describing what grammar rule it represents, and only some of the 
>nodes (integers, operators and other grammar rules that 
>correspond directly to input) have text associated with the tree 
>node.

That is indeed what a parse tree is.

>This is the kind of thing that I'm looking at doing in C# for a 
>smaller calculator-like side project. I'm generating a parser and 
>lexer for my language using antlr 3.1.2. Is a 'parse tree' the 
>structure I'm looking for?
>
>I've been generating an AST, but that just has the token type in 
>each node, and that won't always help in understanding the 
>context of the token.

ANTLR doesn't have any built-in facilities for creating parse 
trees (except for the debugger), though it's not impossible to add 
in something similar.  It's usually not what you really want, 
though.

With ASTs, you normally don't just build the tree from input 
nodes, you also add additional "imaginary" tokens (usually as the 
subtree root) that serve as grouping constructs and identification 
for later use by a tree parser (or manual tree parsing 
code).  This way, they have the semantic information you'd get 
from a parse tree, without being rigidly tied to the rule 
structure (which means that you can refactor the rules without 
affecting the output, and you can leave out levels that don't 
really add any useful information).

>I've been looking at creating a tree grammar, but are the 
>'actions' that you define simply the code that you insert in the 
>tree node definitions? In fact, are the rules in a tree grammar 
>the content of each of the nodes in the tree? If I'm not planning 
>on re-ordering anything, or putting in any actions (I want to do 
>most things code-side apart from the parse tree construction) do 
>I need to create the tree grammar, or can I just use the normal 
>grammar I wrote?

The whole point of a tree grammar is usually to walk through an 
existing tree and carry out some actions as you go.  (Another 
possibility is to output a modified AST, or to use StringTemplate 
to generate some output.)  It's basically just a replacement for 
writing your own tree-walking code.  If you'd rather write the 
walking code yourself, that's fine.