[antlr-interest] How do you build this AST?

Tue Feb 17 03:14:18 PST 2009

Hi all,

(In this e-mail I talk about result obtained with ANTLR 3.1.1)

Below you have a simple sample grammar. I what to get from this
grammar an AST as illustrates below for the following input:
"example child leaf other child leaf child leaf done".

I am not asking you to check is the grammar is correct. Just give me
please an example on how should I write this in antlr3 grammar in
order to get a tree like below. Or give me an example for one rule and
I'll find the way for the others.

It seems that the output=AST option is not enough. Now I only get a
CommonTree node having 9 CommonTree children each representing one of
the lexer tokens.

(I have found a solution though, but there is a problem with it. See
bottom of email. Is this the way to do it?)

//---------------------------------
grammar example ;
options {
output=AST;
}

// parser rules
root      :    subroot1? (subroot2 | subroot3) ;
subroot1  :    EXAMPLE ;
subroot2  :    child1 child2 ;
child1    :    CHILD subchild ;
child2    :    OTHER CHILD subchild ;
subchild  :    LEAF ;
subroot3  :    child1 DONE;

// lexer rules
EXAMPLE   :    'example' ;
LEAF      :    'leaf' ;
CHILD     :    'child' ;
OTHER     :    'OTHER' ;
DONE      :    'done' ;
WS        :    ''\t' | '\s' | '\r' | '\n' {$channel=HIDDEN} ;
//---------------------------------

!USE MONOSPACE FONTS TO PROPERLY SEE THE BELOW TREE!
       ------ root ------
     /         |          \
    /          |           \
subroot1    subroot2      subroot3
           /      \          |
          /        \       child1
      child1      child2
         |          |
     subchild    subchild

Thank you!
(very much)

Possible solution? (Problem described after the grammar.)

//---------------------------------
grammar example ;
options {
output=AST;
}

// imaginary tokens (nodes)
tokens {
Root;
Subroot1;
Subroot2;
Subroot3;
Child1;
Child2;
Subchild;
}

// parser rules
root      :    subroot1? (subroot2 | subroot3) -> ^(Root subroot1?
subroot2? subroot3?);
subroot1  :    EXAMPLE;
subroot2  :    child1 child2 -> ^(Subroot2 child1 child2);
child1    :    CHILD subchild -> ^(Child1 subchild);
child2    :    OTHER CHILD subchild -> ^(Child2 subchild);
subchild  :    LEAF;
subroot3  :    child1 DONE -> ^(Subroot3 child1);

// lexer rules
EXAMPLE   :    'example' ;
LEAF      :    'leaf' ;
CHILD     :    'child' ;
OTHER     :    'OTHER' ;
DONE      :    'done' ;
WS        :    ''\t' | '\s' | '\r' | '\n' {$channel=HIDDEN} ;
//---------------------------------

The problem with the above grammar apeares with the following example.
Assume that 'root' has the following rule:
root      :    subroot1? (subroot2 | subroot3)* -> ??? how to
transform this ??? ;
This rule allows any sequence of subroot2's and subroot3's.

The problem with this transformation is that the '|' operator is not
allowed in the rule transformation. The following is illegal:
root      :    subroot1? (subroot2 | subroot3)* -> subroot1? (subroot2
| subroot3)* ;

If I write it like:
root      :    subroot1? (subroot2 | subroot3)* -> subroot1? subroot2*
subroot3* ;
then I loose the ordering.

How can I solve this? Am I on a wrong track?

-- 
MSc Gabriel Petrovay
MCSA, MCDBA, MCAD
Mobile: +41(0)787978034