[antlr-interest] (Re-) Building the parse tree in Java

Mon Feb 28 03:07:08 PST 2011

Hello,

I've built a grammar using ANTLRWorks 1.4.2, and, after it worked fine, have
generated the respective java code and included it into a mini project to
test my grammar in java now.

So far, it works fine, I'm just searching for hours how to (re-) build the
Parse Tree which is shown in the ANTLRWorks tool (so the tree which shows
how the rules are fired and the input string is gradually split into its
tokens). I anyway got it to fetch the leaves of the tree (tokens) by
performing this code:

<code>

   public static void main(String[] args) throws Exception {

        ANTLRStringStream in = new
ANTLRStringStream("((InnerPaths;(SelfPath;(SelfPath;(Nametoken;Trigram;Set_Average);Set_Average),(SelfNode;(Nametoken;Trigram,UserSyn;Max;Set_Max),(Name;UserSyn;Set_Average);Max;Set_Average);Average;Set_Average);(Both,Threshold(0.3))),(DownPaths;(SelfNode;(Nametoken;Trigram;Set_Average),(Datatype;DatatypeSimilarity;Set_Average);Weighted(0.7,0.3);Set_Average);(Both,Threshold(0.3))))");

        ComaWorkFlowLexer lexer = new ComaWorkFlowLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        ComaWorkFlowParser parser = new ComaWorkFlowParser(tokens);
        ComaWorkFlowParser.coma_return result = parser.coma();

        CommonTree tree = (CommonTree) result.getTree();

        preOrder(tree, 0);   // Show the ground level

    }

    private static void preOrder(CommonTree tree, int depth) {
        for(int i = 0; i < depth; i++) {
            System.out.print("- ");
        }
        System.out.println(">   "+ tree + "    ::    " +
ComaWorkFlowParser.tokenNames[tree.getType()]);
        java.util.List children = tree.getChildren();
        if(children == null) return;
        for(Object o : children) {
            preOrder((CommonTree)o, depth+1);
        }
    }

</code>

This is a nice beginning, however, it would be important for us to also get
(or build) the tree itself, so that we see the parts of the input string
which are on level 1, 2 etc. (the rules that were fired).

Maybe I give an example to make myself more clear: For instance we have some
paragraphs, sentences, words and different punctuations, so we have a
paragraph rule, sentence rule, question rule, some word rules etc. The input
might be: <The weather is nice. Is it nice? It is nice.>

What I'm getting if I perform the code above is something like this: 

The - WORD; weather - WORD; is - WORD; nice - WORD; . - PUNCTUATION; Is -
WORD; it - WORD; nice - WORD; ? - PUNCTUATION; It - WORD; is - WORD; nice -
WORD; . - PUNCTUATION.

So I get the "ground level" (leaves) of the tree (the words, dots etc.).
However, my goal is it to also get the upper levels, like SENTENCE QUESTION
SENTENCE (so the rules which where fired and which are also shown in the
ANTLR Parse Tree),

Our goal is then to say something like this: "The input constists of two
paragraphs. The first paragraph consists of three sentences, a question, and
two further sentences. The questions consists of the 7 words bla, bla, ...
and a question mark. ...". 

That is, we also want to have the opportunity to resolve a part of the tree,
e.g. resolve QUESTION and then get:  Is - WORD, it - WORD, nice - WORD, ? -
PUNCTUATION. 

It will certainly be possible to perform this (ANTLRWorks does it anyway),
but I do not know how. I hope someone can help me. I'm completely new to
ANTLR and already put much work and time in it, yet do not know how to go
on.

Thank you very much.
Patrick

-- 
View this message in context: http://antlr.1301665.n2.nabble.com/Re-Building-the-parse-tree-in-Java-tp6072780p6072780.html
Sent from the ANTLR mailing list archive at Nabble.com.