[antlr-interest] Separating multi-token entries in ASTs

Thu Jun 28 09:19:29 PDT 2007

Hi,
I'm looking for a little help retrieving string lists from ASTs, and keeping
them separate. I will try to explain the issue better here.
I have created a query translator with ANTLR v3.
A simple example query is TAG=(VALUE), where value is one or more terms. e.g.
title=(call of the wild)
The translation of this will be title:"call of the wild".
The issue I have is when VALUE contains a boolean which operates on
multi-term operands.
e.g. title=(Call of the Wild AND Where the Wild Things are).
What I want this to tranlsate to is title:AND("Call of the Wild", "Where the
Wild Things are")
What happens is, when I traverse the tree, the list builds up so  that my
translation becomes:

title:AND("Call of the Wild Where the Wild Things", "are") .

Using ANTLRWorks, I can see the nice AST created, but I'm not sure how to
tell the TreeTraverser to stop acquiring the terms after the first title
because it's not delineated from the second titles once their tokens are
added to the tree.
I realize I am asking a lot by dumping these grammars here and hoping
someone will be inclined to navigate them, but any help would be greatly
appreciated.

Here is my parser grammar:

grammar QueryParser;

options{
    k=4;
    output=AST;
    ASTLabelType=CommonTree;
}

start   : ( query  {System.out.println("AST:\n"+$query.tree.toStringTree());}
)+ ;
query   : field (WS!)* (BOOL_OP^ (WS!)* query)*  ;

field   : tag '=' value  -> ^('=' tag value)
        | qid
        ;

value   : term (WS)* BOOL_OP (WS)* value -> ^(BOOL_OP term value)
        | term
        | LPAREN! value RPAREN!
        ;

tag     : WCHAR;

term    : WCHAR (WS! WCHAR)* ;

qid     : '#'!DIGIT ;

BOOL_OP : 'AND'|'OR'|'NOT';
DIGIT   : ('0'..'9');
WS      : (' '|'\t'|'\r'|'\n')+ ;
LPAREN  : '(' ;
RPAREN  : ')' ;
QUOTE   : '"';
WCHAR   : ~('='|'('| ')'|'"'|' '|'\t'|'\n'|'\r'|'#')+;

Here is my TreeGrammar:

tree grammar QueryTree;

options{
    tokenVocab=QueryParser; //use the vocabulary from the parser
    ASTLabelType=CommonTree; //The kind of tree we are walking
    output=template;
}

// START:query
start[]
@init
{

}
        :  (q+=query)+ -> template1(finalQuery={$q});

query   : field -> template2(express={$field.st})

        | ^(BOOL_OP a=query b=query ) -> template3(op={$BOOL_OP.text},
                                                query1={$a.st},
                                                query2={$b.st})
        ;

field : ^('=' tag value) -> template4(tag={$tag.text},value={$value.st})
        | ^('=' tag (t+=term)+) -> template41(tag={$tag.text},terms={$t})
        | DIGIT -> storedQuery(qid={$DIGIT.text})
        ;

value   : WCHAR -> template5(val ={$value.text})
        | ^(BOOL_OP (a+=term)+ b=value) ->
valueTemplate(wokOp={$BOOL_OP.text},terms={$a} , values={$b.st})
        ;

term    : WCHAR -> template6(t ={$term.text});

tag     : WCHAR ;

And for what its worth here are the templates referenced:

template1(finalQuery) ::= "<finalQuery>"

template2(express) ::= "<express>"

template3(op, query1, query2) ::= "<op>(<query1>,<query2>)"

template4(tag, value) ::= "<tag>:<value; separator=\" \">"

template41(tag, terms) ::= "<tag>:(\"<terms; separator=\" \">\")"

template5(val) ::= "<val>"

wokOpTemplate(wokOp) ::= "<wokOp>"

template6(t) ::= "<t>"

valueTemplate(wokOp) ::="<wokOp>(<terms>,<values>)"

storedQuery(qid) ::= "<qid>"

Thanks again for having a look!

Ted
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20070628/25a12f39/attachment-0001.html