[antlr-interest] Help controlling parser decisions

Ted Villalba ted.villalba at gmail.com
Tue Jul 24 10:56:29 PDT 2007


Hi,

I have a grammar that contains tokens that are sometimes operators,
sometimes not, depending on the context. The set of operators overlaps with
the set of all words that can be acceptable tokens. Trouble is, depending on
the order of my lexer rules,  the parser will recognize all such tokens (AND
, OR ,NEAR) as operators, or will recognize none of them as operators.

So if my lexer rules are:
BOOL_OP    : 'AND'|'and'|'OR'|'or'|'NOT'|'not';
WOK_OP    : 'SAME'|'same'|'NEAR'('/'DIGIT+)*|'near'('/'DIGIT+)*;
...
WCHAR   : ~('='|'('| ')'|'"'|' '|'\t'|'\n'|'\r'|'#')+;

In this order, if any of the tokens from the first 2 rules are encountered,
the parser assumes the token to be an operator, even where there is no
grammar rule to support the notion( and will follow with aa NoViableAlt
exception). If the rules are reversed, it will not recognize any of the
wchar+ as operators.

So if I try to parse something like:
SO=(BY THE AIRPORT) , then it works fine, but if I try SO=(NEAR THE AIRPORT)
it throws the exception, trying to force "NEAR" into the role of operator,
even if the grammar does not support the idea of an operator at the
beginning of a phrase.

Here is my complete grammar:

grammar WQL;

options{
    output=AST;
    ASTLabelType=CommonTree;
}

tokens{ TAG; VALUE; TERMS;} //imaginary token types

@header{
import java.util.HashMap;
}

@members {

HashMap fieldMap = new HashMap();

}





start   : ( query  {System.out.println("AST:\n"+$query.tree.toStringTree());}
)+
        ;


query   : field (BOOL_OP^ query)*
    | LPAREN! query RPAREN! ( BOOL_OP^ query)*
    ;

field     : tag '=' LPAREN value RPAREN -> ^('=' tag value)
    | tag '=' terms+ -> ^('=' tag terms)
        | qid
        ;

value   :  value_ -> ^(VALUE value_) ;

value_  : terms+ (operator^ value)*
    | LPAREN! value RPAREN! ( operator^ value)*
    ;

tag    : WCHAR
    ;

terms   : WCHAR+  -> ^(TERMS WCHAR+ )
    | QUOTE WCHAR+ QUOTE -> ^(TERMS WCHAR+ ) // strip QUOTEs
    ;


qid     : '#'!DIGIT
        ;

operator: BOOL_OP|WOK_OP;


BOOL_OP    : 'AND'|'and'|'OR'|'or'|'NOT'|'not';
WOK_OP    : 'SAME'|'same'|'NEAR'('/'DIGIT+)*|'near'('/'DIGIT+)*;
DIGIT   : ('0'..'9');
WS      : (' '|'\t'|'\r'|'\n')+ {skip();};
LPAREN    : '(' ;
RPAREN    : ')' ;
QUOTE    : '"';
WCHAR   : ~('='|'('| ')'|'"'|' '|'\t'|'\n'|'\r'|'#')+;


Thanks a million for the help.

Ted
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20070724/db368610/attachment-0001.html 


More information about the antlr-interest mailing list