[antlr-interest] Problems writing a searchbar language

Aurélien LARIVE aurelien.larive at 4dconcept.fr
Mon Jan 11 08:31:20 PST 2010


Below is the e-mail John B. Brodie sent to me, which solved my problem.

John B. Brodie wrote :

Greetings!

(I tried to send this to the mail-list, but the list seems to be
rejecting my e-mail at the moment.... sigh)

When you have an implicit AND (e.g. whitespace), your andexpression
sub-tree will not have any root. It will be just a list of notexpression
sub-trees, which your tree walker is not prepared to handle.

More below.....

On Mon, 2010-01-11 at 12:51 +0100, Aurélien LARIVE wrote:
> Hi,
> 
> I'm currently writing a small grammar to parse a searchbar language
and
> I'm failing at making whitespaces behave like the AND keyword.
> 
> Here is my grammar :
> 
> grammar SearchBar;
> 
> options {
>     output=AST;
> }
> 
> WS  : ( ' ' | '\t' ) { skip(); } ;
> AND : 'AND' ;
> OR  : 'OR' ;
> NOT : 'NOT' ;
> LEFT_PAREN  : '(' ;
> RIGHT_PAREN : ')' ;
> TERM        : ~(' '|'\t'|'"'|RIGHT_PAREN|LEFT_PAREN|NOT|OR|AND)* ;
> QUOTEDTERM  : '"' ~('"')* '"' ;
> 
> orexpression
>     : andexpression ( OR^ andexpression )*
>     ;
> 
> andexpression
>     : notexpression ( (AND^)? notexpression )*
>     ;

when the AND is absent e.g. an implied AND via whitespace there will be
no root. so (I THINK) you will just end up with a simple list of
notexpression sub-trees.

suggest these parsing rules instead (tested!):

andexpression
     : notexpression ( and_operator^ notexpression )*
     ;

and_operator : AND | (/*empty*/->AND["implicit_AND"]) ;

NOTE!!! The token spawned for "implicit_AND" above may not contain
meaningful location information (e.g. line number, column, ...whatever).
If that information is important to your application (usually for error
messages), you may need to dig into the details of the "X[...]" ANTLR
meta-notation for token insertion....

> 
> notexpression
>     : (NOT^)? searchterm
>     ;
> 
> searchterm
>     : TERM
>     | QUOTEDTERM
>     | LEFT_PAREN! orexpression RIGHT_PAREN!
>     ;
> 
> And here is my tree grammar :
> 
> tree grammar SearchBarEval;
> 
> options {
>     ASTLabelType=CommonTree;
>     tokenVocab=SearchBar;
> }
> 
> prog
>     : expr+ ;
> 
> expr returns [XMSExpression expression]
>     : ^(OR a=expr b=expr) {
>         $expression = new Or($a.expression, $b.expression);
>     }
>     | ^(AND a=expr b=expr) {
>         $expression = new And($a.expression, $b.expression);
>     }
>     | ^(NOT a=expr) {
>         $expression = new Not($a.expression);
>     }
>     | TERM {
>         $expression = new Term($TERM.text);
>     }
>     | QUOTEDTERM {
>         $expression = new QuotedTerm($QUOTEDTERM.text);
>     }

if you would rather not apply the above suggested parser changes, you
might be able to alter the tree grammar as follows (UNTESTED!):

add an alternative to the expr rule (i think it has to be at the end,
not sure...):
       | implicit_and
>     ;
> 
and then add an implicit_and rule (UNTESTED!):

implicit_and returns [XMSExpression expression]
       : a=expr {$expression = $a.expression;}
           ( b=implicit_and {
               $expression = new And($a.expression, $b.expression);
             }
           )?
       ;
> When I try to evaluate, for example, the input 'apples bananas
tomatos',
> I only get the Term 'apples'. I understand why I'm having this
problem
> but I was unable to find a good solution.
> 
> Thanks in advance,

Hope this helps....
    -jbb





More information about the antlr-interest mailing list