[antlr-interest] flat AST tree

Torsten Curdt tcurdt at vafer.org
Sun Aug 24 04:32:22 PDT 2008


> What hierarchy?  How can ANTLR possibly know which of the tokens is  
> the logical root of the tree, and of the subtrees?  Operators, for  
> example, commonly appear infix, while things like class declarations  
> appear prefix, with additional modifiers optionally prefixed and  
> suffixed to that.  There's just no way it can create a sensible  
> structure without information from the grammar author, so it doesn't  
> try.

Well, of course there is an implicit hierarchy

  ruleA: ruleB;

  ruleB: ruleC;

Is basically:

  ruleA -> ruleB -> ruleC

That is what ANTLRWorks is displaying and that is what I expected the  
AST to be. That it might not be the desired final structure - but  
that's a different thing.

> You can either use a full rewrite notation, as you showed in your  
> original email, or you can use a more concise representation for the  
> simple case where there's only one root and you want the tokens to  
> appear in order.
>
> For example, the rewrite you posted:
>  classDeclaration : 'class' Identifier ( 'extends' identifierList)?
> classBody -> ^('class' Identifier ( 'extends' identifierList)?
> classBody) ;
>
> Could have just been written like this:
>  classDeclaration : 'class'^ Identifier ('extends' identifierList)?  
> classBody;
>
> Either way, it will produce a tree like this:
>  ^('class' Identifier (contents of classBody...))
> or this:
>  ^('class' Identifier 'extends' (contents of identifierList...)  
> (contents of classBody...))
>
> You can extend this to modify the tree a bit; for example to put the  
> 'extends' clause into a subtree if it's present, you can use this:
>  classDeclaration : 'class'^ Identifier baseClass? classBody;
>  baseClass : 'extends'^ identifierList;
>
> If you wanted to change the order things appeared in, though, or  
> insert additional tokens not present in the input (imaginary  
> tokens), then you have to use a -> rewrite.  (You can omit tokens  
> present in the input with !, though.)

I see

> The things to remember are:
>  1. You can either use a -> rewrite *or* the ^ and ! operators  
> within any given rule -- never both.  (You get *really* obscure  
> errors if you violate this.)  Different rules can use different  
> types, though.
>  2. -> gives you more flexibility but you need to restate things,  
> and you need to make sure that the cardinality on both sides matches
>  3. If you're not using a -> rewrite, then you can have at most one  
> ^ within any given rule.
>  4. The root node must be a single token.  You can't make a subrule  
> or block into a root.
>  5. You can use -> multiple times within a rule, but each time it  
> sets the output of the entire rule.  But you can refer to a tree  
> built by an earlier use of ->, so you can build up recursive trees  
> very easily.

Thanks for summarizing that!

cheers
--
Torsten


More information about the antlr-interest mailing list