[antlr-interest] boolean expression with parenthesis parsing problems

Kenneth Stephen marvin.the.cynical.robot at gmail.com
Tue Feb 27 16:15:16 PST 2007


Hi,

    I'm trying to write a parser for a trimmed down version of SQL.
Here is the ANTLR file that I've come up with so far:

header {

package com.ibm.aix.internal.parsers.sql;

	//  Generated code. Do not modify directly unless you really really know
	//  what you are doing.
}

class SQLParser extends Parser;

options {
	k = 10;
}

select_statement:
	select_clause from_clause (where_clause)?
	;
	
select_clause:
	SELECT column (COMMA column)*
	;

from_clause:
	FROM table (COMMA table)*
	;
	
where_clause:
	WHERE search_condition
	;

search_condition:
	(LPAREN or_expr RPAREN) | or_expr;
	
or_expr:
	and_expr (OR and_expr)*;
	
and_expr:
	predicate (AND predicate)*;
	
predicate:
	(predicate_side TEST_OPERATOR predicate_side);
	
predicate_side:
	((correlation_name FULLSTOP)? column_name)| STRING | INTEGER;

column:
	(((correlation_name FULLSTOP)? column_name)| STRING) ((AS)? column_alias)?;

column_name:
	i:IDENTIFIER {System.out.println("column_name = " + i.getText());};

column_alias: IDENTIFIER;
	
correlation_name: IDENTIFIER;

table:
	table_name (correlation_name)?
	;

table_name: i:IDENTIFIER {System.out.println("Table name = " + i.getText());};
	
class SQLLexer extends Lexer;

options {
	k = 2;
}

tokens {
	SELECT = "select";
	AS = "as";
	FROM = "from";
	WHERE = "where";
	AND = "and";
	OR = "or";
	NOT = "not";
}

IDENTIFIER
	: (ALPHA)(ALPHA|DIGIT)*
	;

STRING
	: '\'' ('\u0009'|'\u0020' | NONWHITESPACE | ESCAPED_APOS)* '\''
	;
	
INTEGER
	: (DIGIT)+
	;
	
protected
NONWHITESPACE
	: '\u0021'..'\u0026'|'\u002a'|'\u002b'|'\u002d'|'\u002F'|LPAREN |
RPAREN | COMMA | FULLSTOP | DIGIT |
	  '\u003A'..'\u0040'|'\u005b'..'\u0060'| ALPHA
|'\u007b'..'\u007e'|'\u0085'|'\u00e0'..'\u00ff'
	;

protected
DIGIT
	: '\u0030'..'\u0039'
	;

protected
ALPHA
	: '\u0041'..'\u005a' | '\u0061'..'\u007a'
	;
	
WHITESPACE
	: ('\n' {newline();}|"\r\n" {newline();}|' '|'\t')
	{$setType(Token.SKIP);}
	;
	
LPAREN: '\u0028';
RPAREN: '\u0029';

COMMA : '\u002c' {System.out.println("Gotcha!");};

FULLSTOP : '\u002e';

protected
ESCAPED_APOS
	: "''"
	;
	
TEST_OPERATOR
	: '='|'<'|"<="|'>'|">="|"<>"
	;

    My current problems are with trying to get boolean expression
parsed in the where clause. The rules search_condition, or_expr,
and_expr, predicate, and predicate_side are intended to do that. The
following statement is something that should be recognized :

 select blah, test, 'something' from jiminy where (a = b and (c = d or e = f))

....but instead it dies with

Gotcha!
column_name = blah
Gotcha!
column_name = test
Table name = jiminy
column_name = a
column_name = b
line 1:61: unexpected token: (column_name = d
column_name = e
column_name = f

    I understand why the error is happening - its because a
"predicate" should be substitutable by a "search_condition". However,
I dont know how to fix this problem. When I  try something like

predicate:
	(predicate_side TEST_OPERATOR predicate_side) | search_condition;

    ....ANTLR itself seems to go into an infinite loop and dies after
exhausting the JVM stack. This is puzzling to me because the example
for the expression parser that is up at
http://www.cs.usfca.edu/~parrt/course/652/lectures/antlr.html , seems
to be doing with the ATOM production, just what I tried. Can someone
guide me onto the right path please?

Thanks,
Kenneth


More information about the antlr-interest mailing list