[antlr-interest] parsing boolean expressions: not not or abc

Kevin J. Cummings cummings at kjchome.homeip.net
Thu Jan 14 08:20:28 PST 2010


On 01/14/2010 04:10 AM, lord.of.board at gmx.de wrote:
> Hello,
> 
> I am trying to build a grammar which accepts boolean expressions for filtering. I found some interesting articles on the web, but now I got stuck.
> I try to parse something like this:
> 
>   not not or abc
> 
> The first "not" is the boolean operator and the second is a text.

NOT term OR term

> Or even worse
> 
>   not not and not or and not and

Gawk!  NOT term AND NOT term AND NOT term ????  It took me a couple of
seconds to figure out how this would be legal!  B^)

The parser is *definitely* going to need help figuring out when "not" is
a NOT and when it is a term!

> My grammar look like this:
> 
> grammar TextFilterGrammar;
> options {
> 	output=AST;
> }
> content :	orexpression
> 	;
> orexpression 
> 	:	andexpression (OR^ andexpression)*
> 	;
> andexpression 
> 	:	expression (AND^ expression)*
> 	;
> expression 
> 	:	(NOT^)? term
> 	;
> term 	:	WORD
> 	;
> 
> NOT 	:	'not'
> 	;
> AND 	:	'and'
> 	;
> OR 	:	'or'
> 	;

So, NOT, AND, and OR are reserved words in your grammar.

> WORD	:	('a'..'z' | '0'..'9' | '%' | '_')+
> 	;
> WS 	:	(' ' | '\r' | '\n' | '\t')  { skip(); }
> 	;
> 
> In ANTLRWorks I always get a MismatchedTokenException when trying to parse "not not or ljsdf". Parsing e.g. "not noti or ljsdf" works fine.
> 
> I managed to get it working with quotation marks, but I would prefer to have a solution without.

"not" will always match your TOKEN named NOT.  It will never be a WORD.
 If you wish to allow it as a term, you might want to change your term
production to be:

term : WORD | NOT | AND | OR
     ;

This should effectively allow "not", "and", and "or" to be keywords
instead of reserved words.

But then, how do you want the parser to handle the sequence "not not"?
Is that a NOT WORD or NOT NOT?  Given that you are only allowing one
optional NOT in your expression production, adding the operators to your
term production should work.  But, you'll be in a world of hurt if you
change (NOT)? term to (NOT)* term, as then there is no way to know if a
following "not" is a term or a NOT....  [gawk! the puns are getting bad!]

You may need to add a syntactic predicate to your grammar around the NOT
stuff:

expression : (NOT term)=> (NOT^) term
           | term
           ;

should help you out here....

> Best regards,
> Lordi

-- 
Kevin J. Cummings
kjchome at rcn.com
cummings at kjchome.homeip.net
cummings at kjc386.framingham.ma.us
Registered Linux User #1232 (http://counter.li.org)


More information about the antlr-interest mailing list