[antlr-interest] Starting two parser rules with the same token
Kunal Naik
kunal.a.naik at gmail.com
Tue Feb 28 19:06:54 PST 2012
Hello,
So the subject text is probably already getting most of you ready to yell
"wrong!" but hear me out. I'm trying to write a grammar that supports
something like the following:
(1*2/(3-variableOne) >= variableTwo OR variableThree != 4) AND variableFour
> 5
Basically I want to be able to use parentheses to group the mathematical
operations [(1*2/(3-variableOne) above] as well as use parentheses to group
the boolean operations [binding the two operations around OR above]. The
way the grammar is laid out, we can have an infinite amount of opening
parenthesis so ANTLR can't immediately tell if it's at the start of a
grouped mathematical statement or boolean statement. If I could limit the
number of nested parenthesis, I think I could probably set k in the options
to that same limit and that might help but I haven't come up with an
elegant solution of enforcing a limit.
I feel like this has to be possible because the Java grammar allows me to
do something like:
if((1*2/(3-variableOne) >= variableTwo || variableThree != 4) &&
variableFour > 5) { //do something}
and there is apparently an example Java.g for ANTLR so perhaps it has been
implemented? (although I haven't actually compiled and tested against it,
just read Java.g and couldn't figure out how they accomplished it)
ANTLR is throwing the following error: "rule simpleFilterExpression has
non-LL(*) decision due to recursive rule invocations reachable from alts
1,2. Resolve by left-factoring or using syntactic predicates or using
backtrack=true option." which makes sense now that I've wrapped my head
around the problem. After much Googling, I even tried setting the
backtrack setting to true but that didn't seem to help.
I'm pasting the grammar below if anyone would like to take a stab at it.
Thanks,
Kunal
Grammar:
options
{
output=AST;
ASTLabelType=CommonTree;
}
tokens {
ADD = '+' ;
SUB = '-' ;
MULT = '*' ;
DIV = '/' ;
EQ = '=';
DEQ = '==';
NEQ = '!=';
GT = '>';
GTE = '>=';
LT = '<';
LTE = '<=';
LEFT_PARENTHESIS = '(';
RIGHT_PARENTHESIS = ')';
}
//////////////
// Parser rules
//////////////
// entry point
compoundFilterExpression : orFilterExpression EOF;
// AND takes precedence over OR
orFilterExpression : andFilterExpression (OR^ andFilterExpression)*;
andFilterExpression : simpleFilterExpression (AND^ simpleFilterExpression)*;
simpleFilterExpression
: additiveExpression (EQ|DEQ|NEQ|GT|GTE|LT|LTE)^ additiveExpression
| LEFT_PARENTHESIS! orFilterExpression RIGHT_PARENTHESIS!
;
// * and / take precedence over + and -
additiveExpression : multiplicativeExpression ((ADD|SUB)^
multiplicativeExpression)*;
multiplicativeExpression : atom ((MULT|DIV)^ atom)*;
// There is no way to differentiate between a numeric and string column
// in the grammar so we have to group them together for now and do an
// explicit check while walking the AST
atom
: COLUMN_NAME
| FLOAT
| STRING
| LEFT_PARENTHESIS! additiveExpression RIGHT_PARENTHESIS!
;
//////////////
// Lexer rules (plus the tokens at the top)
//////////////
OR
: 'or'
| 'OR'
| '||'
| '|'
;
AND
: 'and'
| 'AND'
| '&&'
| '&'
;
COLUMN_NAME : ('a'..'z'|'A'..'Z')+ ; // anything from a-z and A-Z
FLOAT
: ('0'..'9')+ '.' ('0'..'9')+ // 123.456
| '.' ('0'..'9')+ //.456
| ('0'..'9')+ // 123
;
STRING
: '"' ( ESC_SEQ | ~('\\'|'"') )+ '"'
;
fragment
HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;
fragment
ESC_SEQ
: '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
| UNICODE_ESC
| OCTAL_ESC
;
fragment
OCTAL_ESC
: '\\' ('0'..'3') ('0'..'7') ('0'..'7')
| '\\' ('0'..'7') ('0'..'7')
| '\\' ('0'..'7')
;
fragment
UNICODE_ESC
: '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
;
WS : ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;} ;
More information about the antlr-interest
mailing list