[antlr-interest] How do you build this AST?

Tue Feb 17 06:17:50 PST 2009

If you have the option output=AST specified, then your rewrite rules are the default behavior. Unnecessary rewrites will incur a significant performance penalty since they are not optimized. The penalty is masked if the grammar is not properly left-factored, but it should still be avoided. The ^,! Notation has better performance than rewrites as well. IE, this:

root     :    subroot1? subroots* -> subroot1? subroots* ;
subroots : subroot2 | subroot3;

Is the same as both this:

root     : subroot1? subroots*;
subroots : subroot2 | subroot3;

And this:

root : subroot1? (subroot2 | subroot3)*;

If you absolutely must have the Root node in your output tree, use this:

root : subroot1? (subroots+=subroot2 | subroots+=subroot3)* -> ^(Root subroot1? subroots*);

Sam

-----Original Message-----
From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Johannes Luber
Sent: Tuesday, February 17, 2009 5:46 AM
To: Gabriel Petrovay; antlr-interest at antlr.org
Subject: Re: [antlr-interest] How do you build this AST?

> Hi all,
> 
> (In this e-mail I talk about result obtained with ANTLR 3.1.1)
> 
> Below you have a simple sample grammar. I what to get from this
> grammar an AST as illustrates below for the following input:
> "example child leaf other child leaf child leaf done".
> 
> I am not asking you to check is the grammar is correct. Just give me
> please an example on how should I write this in antlr3 grammar in
> order to get a tree like below. Or give me an example for one rule and
> I'll find the way for the others.
> 
> It seems that the output=AST option is not enough. Now I only get a
> CommonTree node having 9 CommonTree children each representing one of
> the lexer tokens.
> 
> (I have found a solution though, but there is a problem with it. See
> bottom of email. Is this the way to do it?)
> 
> //---------------------------------
> grammar example ;
> options {
> output=AST;
> }
> 
> // parser rules
> root      :    subroot1? (subroot2 | subroot3) ;
> subroot1  :    EXAMPLE ;
> subroot2  :    child1 child2 ;
> child1    :    CHILD subchild ;
> child2    :    OTHER CHILD subchild ;
> subchild  :    LEAF ;
> subroot3  :    child1 DONE;
> 
> // lexer rules
> EXAMPLE   :    'example' ;
> LEAF      :    'leaf' ;
> CHILD     :    'child' ;
> OTHER     :    'OTHER' ;
> DONE      :    'done' ;
> WS        :    ''\t' | '\s' | '\r' | '\n' {$channel=HIDDEN} ;
> //---------------------------------
> 
> !USE MONOSPACE FONTS TO PROPERLY SEE THE BELOW TREE!
>        ------ root ------
>      /         |          \
>     /          |           \
> subroot1    subroot2      subroot3
>            /      \          |
>           /        \       child1
>       child1      child2
>          |          |
>      subchild    subchild
> 
> 
> Thank you!
> (very much)
> 
> 
> 
> Possible solution? (Problem described after the grammar.)
> 
> //---------------------------------
> grammar example ;
> options {
> output=AST;
> }
> 
> // imaginary tokens (nodes)
> tokens {
> Root;
> Subroot1;
> Subroot2;
> Subroot3;
> Child1;
> Child2;
> Subchild;
> }
> 
> // parser rules
> root      :    subroot1? (subroot2 | subroot3) -> ^(Root subroot1?
> subroot2? subroot3?);
> subroot1  :    EXAMPLE;
> subroot2  :    child1 child2 -> ^(Subroot2 child1 child2);
> child1    :    CHILD subchild -> ^(Child1 subchild);
> child2    :    OTHER CHILD subchild -> ^(Child2 subchild);
> subchild  :    LEAF;
> subroot3  :    child1 DONE -> ^(Subroot3 child1);
> 
> // lexer rules
> EXAMPLE   :    'example' ;
> LEAF      :    'leaf' ;
> CHILD     :    'child' ;
> OTHER     :    'OTHER' ;
> DONE      :    'done' ;
> WS        :    ''\t' | '\s' | '\r' | '\n' {$channel=HIDDEN} ;
> //---------------------------------
> 
> 
> The problem with the above grammar apeares with the following example.
> Assume that 'root' has the following rule:
> root      :    subroot1? (subroot2 | subroot3)* -> ??? how to
> transform this ??? ;
> This rule allows any sequence of subroot2's and subroot3's.
> 
> The problem with this transformation is that the '|' operator is not
> allowed in the rule transformation. The following is illegal:
> root      :    subroot1? (subroot2 | subroot3)* -> subroot1? (subroot2
> | subroot3)* ;
> 
> If I write it like:
> root      :    subroot1? (subroot2 | subroot3)* -> subroot1? subroot2*
> subroot3* ;
> then I loose the ordering.
> 
> How can I solve this? Am I on a wrong track?

Try

root      :    subroot1? subroots* -> subroot1? subroots* ;

subroots : subroot2 | subroot3;

Johannes
-- 
Jetzt 1 Monat kostenlos! GMX FreeDSL - Telefonanschluss + DSL 
für nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address