[antlr-interest] Can't generate the AST I want
John B. Brodie
jbb at acm.org
Mon Aug 8 15:43:58 PDT 2011
On Mon, 2011-08-08 at 19:41 +0000, Scott Smith wrote:
> I am writing a grammar to generate ASTs. There are operators called "AND" and "OR" (defined in the lexer) in the language. I want my parser to produce ASTs that group together all of the terms which are using the same operator.
>
> Also, there can be cases where there is no operator given. In that case, depending on the program configuration, it may be "AND" or "OR". I've called it DFLT_OP for the time being. Here are some examples:
>
> (a OR b OR c) -> (OR a b c)
> a b -> (DFLT_OP a b) // no operator
> a AND (b OR c) AND d ->(AND a (OR b c) d)
> a AND b OR c AND d ->(AND (OR (AND a b) c) d) // I think this one is right; I don't need to reorder terms because AND was used twice
i do not think this is correct.
you have (a AND b OR c AND d) as (((a AND b) OR c) AND d)
but i think most people would expect it to be ((a AND b) OR (c AND d))
e.g. AND has higher precedence than OR (i think)
>
> The grammar I've generated (somewhat simplified) is below. I think everything is pretty vanilla except for the definition of "expression":
>
> grammar testGr;
>
> options {
> language = Java;
> output = AST;
> ASTLabelType = CommonTree;
> }
>
> tokens {
> DFLT_OP;
> }
>
> @header {
> package a.b.c;
> }
>
> @lexer::header {
> package a.b.c;
> }
>
> filter : expression EOF ;
>
> term : IDENTIFIER
> | '('! expression ')'!
> ;
>
> // parsing "(a AND b AND c)" (w/o quotes)
>
> // ATTEMPT 1
> // This gives (AND (AND a b) c). This is correct, but what I
> // really want is (AND a b c)
> //expression
> // : term ((AND^ | OR^)? term)*
> // ;
>
> // ATTEMPT2
> // This doesn't compile due to "recursive rule invocations". I'm
> // also not thrilled the term order changes
> //expression
> // : (id1=term OR)+ id2=term -> ^(OR id2 id1+)
> // | (id1=term AND)+ id2=term -> ^(AND id2 id1+)
> // | id1=term id2=term+ -> ^(DFLT_OP id1 id2+)
> // | term -> term
> // ;
>
> // ATTEMPT3
> // This compiles but appears to produce (AND a c). No "b".
you want to use the `+=` meta operator (not `=`) when collecting a list
of elements on the left of the `->` meta operator. e.g. id2+=term not
id2=term. using the = on a list just gets the last element of the list
as you have observed.
> // expression
> //options {
> // backtrack=true;
> //}
> // : id1=term (OR id2=term)+ -> ^(OR $id1 $id2+)
> // | id1=term (AND id2=term)+ -> ^(AND $id1 $id2+)
> // | id1=term id2=term+ -> ^(DFLT_OP $id1 $id2+)
> // | term -> term
> // ;
> //
>
> // This doesn't compile due to "recursive rule invocations".
> expression
> : id1=term OR id2=term (OR id3=term)+ -> ^(OR $id1 $id2 $id3+)
> | id1=term AND id2=term (AND id3=term)+ -> ^(AND $id1 $id2 $id3+)
> | id1=term id2=term+ -> ^(DFLT_OP $id1 $id2+)
> | term -> term
> ;
>
> AND : 'AND' | '&&' ;
> OR : 'OR' | '||' ;
> IDENTIFIER : LETTER (LETTER | NUMBER)* ;
> fragment LETTER : LOWER | UPPER ;
> fragment LOWER : 'a'..'z' ;
> fragment UPPER : 'A'..'Z' ;
> fragment NUMBER : '0'..'9' ;
> WS : (' ' | '\t' | '\n' | '\r') {$channel=HIDDEN; } ;
>
> Is there a way to tell ANTLR what I want to do? How should I write the expression?
>
> Any thoughts about the minor problem of substituting AND or OR instead of the DFLT_OP. I figured out how to pass a Boolean to my parser which tells it which one of these is the correct one. Can I write something like:
>
> ({dftlOpAND ? AND : OR} $id1 $id2)
I am not sure that you can do this in constructing a root of a sub-tree.
attached please find a complete grammar that solves all 3 of the
problems identified:
a) precedence of AND over OR
b) making AND and OR n-ary trees rather than binary
c) inserting an imaginary AND or OR as root when no connective
operator is present.
Hope this helps...
-jbb
-------------- next part --------------
grammar Test;
options {
output = AST;
ASTLabelType = CommonTree;
}
@members {
boolean dfltOpAND = true;
private static final String [] x = new String[] {
"a", // -> a
"(a OR b OR c)", // -> (OR a b c)
"a AND (b OR c) AND d", // -> (AND a (OR b c) d)
"a AND b OR c AND d", // -> (OR (AND a b) (AND c d))
"a b",
"a b c d",
"a (b OR c) d",
"a b OR c d",
};
public static void main(String [] args) {
for( int i = 0; i < x.length; ++i ) {
try {
System.out.println("about to parse:`"+x[i]+"`");
TestLexer lexer = new TestLexer(new ANTLRStringStream(x[i]));
CommonTokenStream tokens = new CommonTokenStream(lexer);
// System.out.format("dump of the token stream:\%n");
// int j = 0;
// boolean looping = true;
// while( looping ) {
// Token token = lexer.nextToken();
// int typ = token.getType();
// System.out.format("\%d: type = \%s, text = `\%s`\%s\%n",
// j++,
// typ==EOF?"EOF":tokenNames[typ],
// token.getText(),
// token.getChannel()==HIDDEN?" (HIDDEN)":"");
// looping = typ != EOF;
// }
// lexer.reset();
// System.out.format("now performing the parse\n");
TestParser parser = new TestParser(tokens);
TestParser.test_return p_result = parser.test();
CommonTree ast = p_result.tree;
if( ast == null ) {
System.out.println("resultant tree: is NULL");
} else {
System.out.println("resultant tree: " + ast.toStringTree());
}
System.out.println();
} catch(Exception e) {
e.printStackTrace();
}
}
}
}
test : filter ;
filter : expression EOF ;
expression :
(disjunction -> disjunction)
((disjunctive_op x+=disjunction)+ -> ^(disjunctive_op $expression $x+))?
;
disjunctive_op :
( OR -> OR )
| ( ( { ! dfltOpAND }?=>/*empty*/ ) -> OR[input.LT(1), "OR"] )
;
disjunction :
(conjunction -> conjunction)
((conjunctive_op x+=conjunction)+ -> ^(conjunctive_op $disjunction $x+))?
;
conjunctive_op :
( AND -> AND )
| ( ( { dfltOpAND }?=>/*empty*/ ) -> AND[input.LT(1), "AND"] )
;
conjunction : primary ;
primary : IDENTIFIER | '('! expression ')'! ;
AND : 'AND' | '&&' ;
OR : 'OR' | '||' ;
IDENTIFIER : LETTER (LETTER | NUMBER)* ;
fragment LETTER : LOWER | UPPER ;
fragment LOWER : 'a'..'z' ;
fragment UPPER : 'A'..'Z' ;
fragment NUMBER : '0'..'9' ;
WS : (' ' | '\t' | '\n' | '\r') {$channel=HIDDEN; } ;
More information about the antlr-interest
mailing list