[antlr-interest] double NOT removal during tree parsing?

Fri May 29 02:09:28 PDT 2009

Halo,

I'm writing a SQL grammar. I've decided to create a basic AST and then
to optimize it a little bit during the tree parsing. One of the
optimizations is to remove two consequent NOT nodes in a condition
subtree. My naive implementation is this:

condition
	:	^(OR condition condition)
	|	^(AND condition condition)
	|	(^(NOT ^(NOT .))) => ^(NOT ^(NOT condition)) -> condition
	|	^(NOT condition)
	|	predicate ;

predicate
	:	^(comparisonOperator expression expression)

This works perfectly fine for simple conditions (NOT NOT pred). But
when the double NOT subtree is a part of a more complex condition (NOT
NOT pred1 AND pred2) the processing goes wrong. When debugging I see
the decisioning for the pred2 predicate is somehow sick. And even more
the incoming tokens are not those what I expect should come in (when
looking at the LT events).

To be more explaning, the condition is this: NOT NOT col1 < col2 AND
col3 > col4.
So the pred2 predicate states for col3 > col4. By the first parsing it
is transformed into: ^('>' col3 col4). When this is parsed by the tree
parser (after the double not optimization of NOT NOT col1 < col2) the
token '>' is consumed directly in the condition rule! And consecutive
token (LT event) is not "col3" but some completely different, let's
say X. With this X token the parser goes to predicate rule where it
fails, of course (NoViableAltException).

Do you see anything wrong in the stated tree grammar?

Regards,
Tomas