[antlr-interest] Can't generate the AST I want

John B. Brodie jbb at acm.org
Mon Aug 8 15:43:58 PDT 2011


On Mon, 2011-08-08 at 19:41 +0000, Scott Smith wrote:
> I am writing a grammar to generate ASTs.  There are operators called "AND" and "OR" (defined in the lexer) in the language.  I want my parser to produce ASTs that group together all of the terms which are using the same operator.
> 
> Also, there can be cases where there is no operator given.  In that case, depending on the program configuration, it may be "AND" or "OR".  I've called it DFLT_OP for the time being.  Here are some examples:
> 
> (a OR b OR c) -> (OR a b c)
> a b -> (DFLT_OP a b)                                                       // no operator
> a AND (b OR c) AND d ->(AND a (OR b c) d)
> a AND b OR c AND d ->(AND (OR (AND a b) c) d)  // I think this one is right; I don't need to reorder terms because AND was used twice

i do not think this is correct.
you have (a AND b OR c AND d) as (((a AND b) OR c) AND d)
but i think most people would expect it to be ((a AND b) OR (c AND d))

e.g. AND has higher precedence than OR (i think)

> 
> The grammar I've generated (somewhat simplified) is below.  I think everything is pretty vanilla except for the definition of "expression":
> 
> grammar testGr;
> 
> options {
>   language = Java;
>   output = AST;
>   ASTLabelType = CommonTree;
> }
> 
> tokens {
>   DFLT_OP;
> }
> 
> @header {
>   package a.b.c;
> }
> 
> @lexer::header {
>   package a.b.c;
> }
> 
> filter     :    expression EOF ;
> 
> term :    IDENTIFIER
>      |    '('! expression ')'!
>      ;
> 
> // parsing "(a AND b AND c)" (w/o quotes)
> 
> // ATTEMPT 1
> // This gives (AND (AND a b) c).  This is correct, but what I
> // really want is (AND a b c)
> //expression
> //   :    term ((AND^ | OR^)? term)*
> //   ;
> 
> // ATTEMPT2
> // This doesn't compile due to "recursive rule invocations".  I'm
> // also not thrilled the term order changes
> //expression
> //   :    (id1=term OR)+ id2=term -> ^(OR id2 id1+)
> //   |    (id1=term AND)+ id2=term -> ^(AND id2 id1+)
> //   |    id1=term id2=term+  -> ^(DFLT_OP id1 id2+)
> //   |    term -> term
> //   ;
> 
> // ATTEMPT3
> // This compiles but appears to produce (AND a c).  No "b".

you want to use the `+=` meta operator (not `=`) when collecting a list
of elements on the left of the `->` meta operator. e.g. id2+=term not
id2=term. using the = on a list just gets the last element of the list
as you have observed.

> // expression
> //options {
> //   backtrack=true;
> //}
> //   :    id1=term (OR id2=term)+ -> ^(OR $id1 $id2+)
> //   |    id1=term (AND id2=term)+ -> ^(AND $id1 $id2+)
> //   |    id1=term id2=term+  -> ^(DFLT_OP $id1 $id2+)
> //   |    term -> term
> //   ;
> //
> 
> // This doesn't compile due to "recursive rule invocations".
> expression
>      :    id1=term OR id2=term (OR id3=term)+ -> ^(OR $id1 $id2 $id3+)
>      |    id1=term AND id2=term (AND id3=term)+ -> ^(AND $id1 $id2 $id3+)
>      |    id1=term id2=term+  -> ^(DFLT_OP $id1 $id2+)
>      |    term -> term
>      ;
> 
> AND  :    'AND' | '&&' ;
> OR   :    'OR' | '||' ;
> IDENTIFIER :    LETTER (LETTER | NUMBER)* ;
> fragment LETTER :    LOWER | UPPER ;
> fragment LOWER  :    'a'..'z' ;
> fragment UPPER  :    'A'..'Z' ;
> fragment NUMBER :    '0'..'9' ;
> WS  : (' ' | '\t' | '\n' | '\r') {$channel=HIDDEN; } ;
> 
> Is there a way to tell ANTLR what I want to do?  How should I write the expression?
> 
> Any thoughts about the minor problem of substituting AND or OR instead of the DFLT_OP.  I figured out how to pass a Boolean to my parser which tells it which one of these is the correct one.  Can I write something like:
> 
>                 ({dftlOpAND ? AND : OR} $id1 $id2)

I am not sure that you can do this in constructing a root of a sub-tree.

attached please find a complete grammar that solves all 3 of the
problems identified:
        a) precedence of AND over OR
        b) making AND and OR n-ary trees rather than binary
        c) inserting an imaginary AND or OR as root when no connective
           operator is present.

Hope this helps...
   -jbb

-------------- next part --------------
grammar Test;

options {
   output = AST;
   ASTLabelType = CommonTree;
}

@members {

   boolean dfltOpAND = true;

   private static final String [] x = new String[] {
      "a",                      // -> a
      "(a OR b OR c)",          // -> (OR a b c)
      "a AND (b OR c) AND d",   // -> (AND a (OR b c) d)
      "a AND b OR c AND d",     // -> (OR (AND a b) (AND c d))
      "a b",
      "a b c d",
      "a (b OR c) d",
      "a b OR c d",
 };

   public static void main(String [] args) {
      for( int i = 0; i < x.length; ++i ) {
         try {
            System.out.println("about to parse:`"+x[i]+"`");

            TestLexer lexer = new TestLexer(new ANTLRStringStream(x[i]));
            CommonTokenStream tokens = new CommonTokenStream(lexer);

            // System.out.format("dump of the token stream:\%n");
            // int j = 0;
            // boolean looping = true;
            // while( looping ) {
            //    Token token = lexer.nextToken();
            //    int typ = token.getType();
            //    System.out.format("\%d: type = \%s, text = `\%s`\%s\%n",
            //                      j++,
            //                      typ==EOF?"EOF":tokenNames[typ],
            //                      token.getText(),
            //                      token.getChannel()==HIDDEN?" (HIDDEN)":"");
            //    looping = typ != EOF;
            // }
            // lexer.reset();
            // System.out.format("now performing the parse\n");

            TestParser parser = new TestParser(tokens);
            TestParser.test_return p_result = parser.test();

            CommonTree ast = p_result.tree;
            if( ast == null ) {
               System.out.println("resultant tree: is NULL");
            } else {
               System.out.println("resultant tree: " + ast.toStringTree());
            }
            System.out.println();
         } catch(Exception e) {
            e.printStackTrace();
         }
      }
   }
}

test : filter ;

filter     :    expression EOF ;

expression :
      (disjunction -> disjunction)
      ((disjunctive_op x+=disjunction)+ -> ^(disjunctive_op $expression $x+))?
   ;
disjunctive_op :
      ( OR -> OR )
   |  ( ( { ! dfltOpAND }?=>/*empty*/ ) -> OR[input.LT(1), "OR"] )
   ;

disjunction :
      (conjunction -> conjunction)
      ((conjunctive_op x+=conjunction)+ -> ^(conjunctive_op $disjunction $x+))?
   ;
conjunctive_op :
      ( AND -> AND )
   |  ( ( { dfltOpAND }?=>/*empty*/ ) -> AND[input.LT(1), "AND"] )
   ;

conjunction : primary ;

primary : IDENTIFIER | '('! expression ')'! ;


AND  :    'AND' | '&&' ;
OR   :    'OR' | '||' ;
IDENTIFIER :    LETTER (LETTER | NUMBER)* ;
fragment LETTER :    LOWER | UPPER ;
fragment LOWER  :    'a'..'z' ;
fragment UPPER  :    'A'..'Z' ;
fragment NUMBER :    '0'..'9' ;
WS  : (' ' | '\t' | '\n' | '\r') {$channel=HIDDEN; } ;


More information about the antlr-interest mailing list