[antlr-interest] Simplest working boolean grammar

Alfonso barbolani at ya.com
Mon Sep 17 16:31:41 PDT 2007


   
I think I've finally built a "correct" grammar for parsing boolean 
expressions


grammar Success;


tokens {
AND = 'and';
OR = 'or';
}

file    :    boolean_expression EOF;

boolean_expression:   
        boolean_term ( OR boolean_term)*
    ;
   
boolean_term
    :    boolean_atom (AND boolean_atom)*
    ;
 
boolean_atom
    :   
          (OPEN_PAREN boolean_expression CLOSE_PAREN) => OPEN_PAREN 
boolean_expression CLOSE_PAREN
        | expression EQ expression
    ;

expression: term ( (PLUS|MINUS) term)*
    ;
 
term    :
      atom ( (DIVIDE | ASTERISK) atom)*
    ;

atom    :
      NUMBER
    | (OPEN_PAREN expression CLOSE_PAREN) => OPEN_PAREN expression 
CLOSE_PAREN
    ;

NUMBER    : ('0'..'9')+;
   
PLUS     : '+';
MINUS    : '-';
DIVIDE  : '/';
ASTERISK : '*';
EQ    : '=';

OPEN_PAREN : '(' ;
CLOSE_PAREN: ')' ;
   
This grammar seems to do the right thing under the ANTLRWorks debugger 
(the interpreter is still failing to interpret it correctly, though) I 
had to use the  => operation to resolve the ambiguity that exists when 
the boolean expression starts with a parenthesis, since the parser does 
not know if this is part of a boolean_expression or is really the start 
of an expression until it sees either the matching closing parenthesis 
and the next token or an EQ.

I don't understand fully the code that ANTLR generates, but I know how 
would I code this if I were generating the recursive descent parser by 
hand. When finding an opening parenthesis, I'd start trying to parse the 
input stream as if it were a boolean expression. If during the parsing 
found something syntactically incorrect inside the parenthesis, I'll  
"rewind" the stream just before the starting parenthesis and start 
trying to parse the expression EQ expression part.

However, it seems a naive approach, since by the time you decide that 
you've not found a OPEN_PAREN boolean_expression CLOSE_PAREN, you've 
already gone a long way of parsing the actual expression EQ expression 
(at least the first atom) I'm not sure of how I would try to optimize 
this so that the work done on parsing the first atom so far could be 
saved, but my question is, is ANTLR doing the same as my first stab at 
doing it by hand?

I know that those are basic questions, but I'm so intrigued by ANTLR (if 
only the works interpreter worked....) that is tempting to explore.


More information about the antlr-interest mailing list