[antlr-interest] Can't generate the AST I want

Scott Smith ssmith at mainstreamdata.com
Mon Aug 8 12:41:05 PDT 2011


I am writing a grammar to generate ASTs.  There are operators called "AND" and "OR" (defined in the lexer) in the language.  I want my parser to produce ASTs that group together all of the terms which are using the same operator.

Also, there can be cases where there is no operator given.  In that case, depending on the program configuration, it may be "AND" or "OR".  I've called it DFLT_OP for the time being.  Here are some examples:

(a OR b OR c) -> (OR a b c)
a b -> (DFLT_OP a b)                                                       // no operator
a AND (b OR c) AND d ->(AND a (OR b c) d)
a AND b OR c AND d ->(AND (OR (AND a b) c) d)  // I think this one is right; I don't need to reorder terms because AND was used twice

The grammar I've generated (somewhat simplified) is below.  I think everything is pretty vanilla except for the definition of "expression":

grammar testGr;

options {
  language = Java;
  output = AST;
  ASTLabelType = CommonTree;
}

tokens {
  DFLT_OP;
}

@header {
  package a.b.c;
}

@lexer::header {
  package a.b.c;
}

filter     :    expression EOF ;

term :    IDENTIFIER
     |    '('! expression ')'!
     ;

// parsing "(a AND b AND c)" (w/o quotes)

// ATTEMPT 1
// This gives (AND (AND a b) c).  This is correct, but what I
// really want is (AND a b c)
//expression
//   :    term ((AND^ | OR^)? term)*
//   ;

// ATTEMPT2
// This doesn't compile due to "recursive rule invocations".  I'm
// also not thrilled the term order changes
//expression
//   :    (id1=term OR)+ id2=term -> ^(OR id2 id1+)
//   |    (id1=term AND)+ id2=term -> ^(AND id2 id1+)
//   |    id1=term id2=term+  -> ^(DFLT_OP id1 id2+)
//   |    term -> term
//   ;

// ATTEMPT3
// This compiles but appears to produce (AND a c).  No "b".
// expression
//options {
//   backtrack=true;
//}
//   :    id1=term (OR id2=term)+ -> ^(OR $id1 $id2+)
//   |    id1=term (AND id2=term)+ -> ^(AND $id1 $id2+)
//   |    id1=term id2=term+  -> ^(DFLT_OP $id1 $id2+)
//   |    term -> term
//   ;
//

// This doesn't compile due to "recursive rule invocations".
expression
     :    id1=term OR id2=term (OR id3=term)+ -> ^(OR $id1 $id2 $id3+)
     |    id1=term AND id2=term (AND id3=term)+ -> ^(AND $id1 $id2 $id3+)
     |    id1=term id2=term+  -> ^(DFLT_OP $id1 $id2+)
     |    term -> term
     ;

AND  :    'AND' | '&&' ;
OR   :    'OR' | '||' ;
IDENTIFIER :    LETTER (LETTER | NUMBER)* ;
fragment LETTER :    LOWER | UPPER ;
fragment LOWER  :    'a'..'z' ;
fragment UPPER  :    'A'..'Z' ;
fragment NUMBER :    '0'..'9' ;
WS  : (' ' | '\t' | '\n' | '\r') {$channel=HIDDEN; } ;

Is there a way to tell ANTLR what I want to do?  How should I write the expression?

Any thoughts about the minor problem of substituting AND or OR instead of the DFLT_OP.  I figured out how to pass a Boolean to my parser which tells it which one of these is the correct one.  Can I write something like:

                ({dftlOpAND ? AND : OR} $id1 $id2)



More information about the antlr-interest mailing list