[antlr-interest] Starting two parser rules with the same token

Tue Feb 28 19:06:54 PST 2012

Hello,

So the subject text is probably already getting most of you ready to yell
"wrong!" but hear me out.  I'm trying to write a grammar that supports
something like the following:
(1*2/(3-variableOne) >= variableTwo OR variableThree != 4) AND variableFour
> 5

Basically I want to be able to use parentheses to group the mathematical
operations [(1*2/(3-variableOne) above] as well as use parentheses to group
the boolean operations [binding the two operations around OR above].  The
way the grammar is laid out, we can have an infinite amount of opening
parenthesis so ANTLR can't immediately tell if it's at the start of a
grouped mathematical statement or boolean statement.  If I could limit the
number of nested parenthesis, I think I could probably set k in the options
to that same limit and that might help but I haven't come up with an
elegant solution of enforcing a limit.

I feel like this has to be possible because the Java grammar allows me to
do something like:
if((1*2/(3-variableOne) >= variableTwo || variableThree != 4) &&
variableFour > 5) { //do something}
and there is apparently an example Java.g for ANTLR so perhaps it has been
implemented?  (although I haven't actually compiled and tested against it,
just read Java.g and couldn't figure out how they accomplished it)

ANTLR is throwing the following error: "rule simpleFilterExpression has
non-LL(*) decision due to recursive rule invocations reachable from alts
1,2.  Resolve by left-factoring or using syntactic predicates or using
backtrack=true option." which makes sense now that I've wrapped my head
around the problem.  After much Googling, I even tried setting the
backtrack setting to true but that didn't seem to help.
I'm pasting the grammar below if anyone would like to take a stab at it.

Thanks,
Kunal

Grammar:

options
{
    output=AST;
    ASTLabelType=CommonTree;
}

tokens {
    ADD  = '+' ;
    SUB  = '-' ;
    MULT = '*' ;
    DIV  = '/' ;
    EQ   = '=';
    DEQ  = '==';
    NEQ  = '!=';
    GT   = '>';
    GTE  = '>=';
    LT   = '<';
    LTE  = '<=';
    LEFT_PARENTHESIS  = '(';
    RIGHT_PARENTHESIS = ')';
}

//////////////
// Parser rules
//////////////
// entry point
compoundFilterExpression : orFilterExpression EOF;

// AND takes precedence over OR
orFilterExpression : andFilterExpression (OR^ andFilterExpression)*;

andFilterExpression : simpleFilterExpression (AND^ simpleFilterExpression)*;

simpleFilterExpression
    : additiveExpression (EQ|DEQ|NEQ|GT|GTE|LT|LTE)^ additiveExpression
    | LEFT_PARENTHESIS! orFilterExpression RIGHT_PARENTHESIS!
    ;

// * and / take precedence over + and -
additiveExpression : multiplicativeExpression ((ADD|SUB)^
multiplicativeExpression)*;

multiplicativeExpression : atom ((MULT|DIV)^ atom)*;

// There is no way to differentiate between a numeric and string column
// in the grammar so we have to group them together for now and do an
// explicit check while walking the AST
atom
    : COLUMN_NAME
    | FLOAT
    | STRING
    | LEFT_PARENTHESIS! additiveExpression RIGHT_PARENTHESIS!
    ;

//////////////
// Lexer rules (plus the tokens at the top)
//////////////
OR
    : 'or'
    | 'OR'
    | '||'
    | '|'
    ;

AND
    : 'and'
    | 'AND'
    | '&&'
    | '&'
    ;

COLUMN_NAME : ('a'..'z'|'A'..'Z')+ ; // anything from a-z and A-Z

FLOAT
    : ('0'..'9')+ '.' ('0'..'9')+    // 123.456
    | '.' ('0'..'9')+ //.456
    | ('0'..'9')+  // 123
    ;

STRING
    :  '"' ( ESC_SEQ | ~('\\'|'"') )+ '"'
    ;

fragment
HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;

fragment
ESC_SEQ
    :   '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
    |   UNICODE_ESC
    |   OCTAL_ESC
    ;

fragment
OCTAL_ESC
    :   '\\' ('0'..'3') ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7')
    ;

fragment
UNICODE_ESC
    :   '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
    ;

WS : ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;} ;