[antlr-interest] flat AST tree

Gavin Lambert antlr at mirality.co.nz
Sat Aug 23 23:52:30 PDT 2008


At 15:13 24/08/2008, Torsten Curdt wrote:
 >When I now look at the AST it appears to be totally flat
[...]
 >But that totally blows up the grammar. Why isn't it just
 >the same hierarchy as the input tree?

What hierarchy?  How can ANTLR possibly know which of the tokens 
is the logical root of the tree, and of the subtrees?  Operators, 
for example, commonly appear infix, while things like class 
declarations appear prefix, with additional modifiers optionally 
prefixed and suffixed to that.  There's just no way it can create 
a sensible structure without information from the grammar author, 
so it doesn't try.

You can either use a full rewrite notation, as you showed in your 
original email, or you can use a more concise representation for 
the simple case where there's only one root and you want the 
tokens to appear in order.

For example, the rewrite you posted:
   classDeclaration : 'class' Identifier ( 'extends' 
identifierList)?
classBody -> ^('class' Identifier ( 'extends' identifierList)?
classBody) ;

Could have just been written like this:
   classDeclaration : 'class'^ Identifier ('extends' 
identifierList)? classBody;

Either way, it will produce a tree like this:
   ^('class' Identifier (contents of classBody...))
or this:
   ^('class' Identifier 'extends' (contents of identifierList...) 
(contents of classBody...))

You can extend this to modify the tree a bit; for example to put 
the 'extends' clause into a subtree if it's present, you can use 
this:
   classDeclaration : 'class'^ Identifier baseClass? classBody;
   baseClass : 'extends'^ identifierList;

If you wanted to change the order things appeared in, though, or 
insert additional tokens not present in the input (imaginary 
tokens), then you have to use a -> rewrite.  (You can omit tokens 
present in the input with !, though.)

The things to remember are:
   1. You can either use a -> rewrite *or* the ^ and ! operators 
within any given rule -- never both.  (You get *really* obscure 
errors if you violate this.)  Different rules can use different 
types, though.
   2. -> gives you more flexibility but you need to restate 
things, and you need to make sure that the cardinality on both 
sides matches
   3. If you're not using a -> rewrite, then you can have at most 
one ^ within any given rule.
   4. The root node must be a single token.  You can't make a 
subrule or block into a root.
   5. You can use -> multiple times within a rule, but each time 
it sets the output of the entire rule.  But you can refer to a 
tree built by an earlier use of ->, so you can build up recursive 
trees very easily.



More information about the antlr-interest mailing list