[antlr-interest] flat AST tree
Gavin Lambert
antlr at mirality.co.nz
Sat Aug 23 23:52:30 PDT 2008
At 15:13 24/08/2008, Torsten Curdt wrote:
>When I now look at the AST it appears to be totally flat
[...]
>But that totally blows up the grammar. Why isn't it just
>the same hierarchy as the input tree?
What hierarchy? How can ANTLR possibly know which of the tokens
is the logical root of the tree, and of the subtrees? Operators,
for example, commonly appear infix, while things like class
declarations appear prefix, with additional modifiers optionally
prefixed and suffixed to that. There's just no way it can create
a sensible structure without information from the grammar author,
so it doesn't try.
You can either use a full rewrite notation, as you showed in your
original email, or you can use a more concise representation for
the simple case where there's only one root and you want the
tokens to appear in order.
For example, the rewrite you posted:
classDeclaration : 'class' Identifier ( 'extends'
identifierList)?
classBody -> ^('class' Identifier ( 'extends' identifierList)?
classBody) ;
Could have just been written like this:
classDeclaration : 'class'^ Identifier ('extends'
identifierList)? classBody;
Either way, it will produce a tree like this:
^('class' Identifier (contents of classBody...))
or this:
^('class' Identifier 'extends' (contents of identifierList...)
(contents of classBody...))
You can extend this to modify the tree a bit; for example to put
the 'extends' clause into a subtree if it's present, you can use
this:
classDeclaration : 'class'^ Identifier baseClass? classBody;
baseClass : 'extends'^ identifierList;
If you wanted to change the order things appeared in, though, or
insert additional tokens not present in the input (imaginary
tokens), then you have to use a -> rewrite. (You can omit tokens
present in the input with !, though.)
The things to remember are:
1. You can either use a -> rewrite *or* the ^ and ! operators
within any given rule -- never both. (You get *really* obscure
errors if you violate this.) Different rules can use different
types, though.
2. -> gives you more flexibility but you need to restate
things, and you need to make sure that the cardinality on both
sides matches
3. If you're not using a -> rewrite, then you can have at most
one ^ within any given rule.
4. The root node must be a single token. You can't make a
subrule or block into a root.
5. You can use -> multiple times within a rule, but each time
it sets the output of the entire rule. But you can refer to a
tree built by an earlier use of ->, so you can build up recursive
trees very easily.
More information about the antlr-interest
mailing list