[antlr-interest] How do you build this AST?
Gabriel Petrovay
gabriel.petrovay at 28msec.com
Tue Feb 17 03:14:18 PST 2009
Hi all,
(In this e-mail I talk about result obtained with ANTLR 3.1.1)
Below you have a simple sample grammar. I what to get from this
grammar an AST as illustrates below for the following input:
"example child leaf other child leaf child leaf done".
I am not asking you to check is the grammar is correct. Just give me
please an example on how should I write this in antlr3 grammar in
order to get a tree like below. Or give me an example for one rule and
I'll find the way for the others.
It seems that the output=AST option is not enough. Now I only get a
CommonTree node having 9 CommonTree children each representing one of
the lexer tokens.
(I have found a solution though, but there is a problem with it. See
bottom of email. Is this the way to do it?)
//---------------------------------
grammar example ;
options {
output=AST;
}
// parser rules
root : subroot1? (subroot2 | subroot3) ;
subroot1 : EXAMPLE ;
subroot2 : child1 child2 ;
child1 : CHILD subchild ;
child2 : OTHER CHILD subchild ;
subchild : LEAF ;
subroot3 : child1 DONE;
// lexer rules
EXAMPLE : 'example' ;
LEAF : 'leaf' ;
CHILD : 'child' ;
OTHER : 'OTHER' ;
DONE : 'done' ;
WS : ''\t' | '\s' | '\r' | '\n' {$channel=HIDDEN} ;
//---------------------------------
!USE MONOSPACE FONTS TO PROPERLY SEE THE BELOW TREE!
------ root ------
/ | \
/ | \
subroot1 subroot2 subroot3
/ \ |
/ \ child1
child1 child2
| |
subchild subchild
Thank you!
(very much)
Possible solution? (Problem described after the grammar.)
//---------------------------------
grammar example ;
options {
output=AST;
}
// imaginary tokens (nodes)
tokens {
Root;
Subroot1;
Subroot2;
Subroot3;
Child1;
Child2;
Subchild;
}
// parser rules
root : subroot1? (subroot2 | subroot3) -> ^(Root subroot1?
subroot2? subroot3?);
subroot1 : EXAMPLE;
subroot2 : child1 child2 -> ^(Subroot2 child1 child2);
child1 : CHILD subchild -> ^(Child1 subchild);
child2 : OTHER CHILD subchild -> ^(Child2 subchild);
subchild : LEAF;
subroot3 : child1 DONE -> ^(Subroot3 child1);
// lexer rules
EXAMPLE : 'example' ;
LEAF : 'leaf' ;
CHILD : 'child' ;
OTHER : 'OTHER' ;
DONE : 'done' ;
WS : ''\t' | '\s' | '\r' | '\n' {$channel=HIDDEN} ;
//---------------------------------
The problem with the above grammar apeares with the following example.
Assume that 'root' has the following rule:
root : subroot1? (subroot2 | subroot3)* -> ??? how to
transform this ??? ;
This rule allows any sequence of subroot2's and subroot3's.
The problem with this transformation is that the '|' operator is not
allowed in the rule transformation. The following is illegal:
root : subroot1? (subroot2 | subroot3)* -> subroot1? (subroot2
| subroot3)* ;
If I write it like:
root : subroot1? (subroot2 | subroot3)* -> subroot1? subroot2*
subroot3* ;
then I loose the ordering.
How can I solve this? Am I on a wrong track?
--
MSc Gabriel Petrovay
MCSA, MCDBA, MCAD
Mobile: +41(0)787978034
More information about the antlr-interest
mailing list