[antlr-interest] Help controlling parser decisions

Tue Jul 24 11:03:13 PDT 2007

On 7/25/07, Ted Villalba <ted.villalba at gmail.com> wrote:
> Hi,
>
> I have a grammar that contains tokens that are sometimes operators,
> sometimes not, depending on the context. The set of operators overlaps with
> the set of all words that can be acceptable tokens. Trouble is, depending on
> the order of my lexer rules,  the parser will recognize all such tokens (AND
> , OR ,NEAR) as operators, or will recognize none of them as operators.
>
> So if my lexer rules are:
> BOOL_OP    : 'AND'|'and'|'OR'|'or'|'NOT'|'not';
> WOK_OP    :
> 'SAME'|'same'|'NEAR'('/'DIGIT+)*|'near'('/'DIGIT+)*;
> ...
> WCHAR   : ~('='|'('| ')'|'"'|' '|'\t'|'\n'|'\r'|'#')+;
>
> In this order, if any of the tokens from the first 2 rules are encountered,
> the parser assumes the token to be an operator, even where there is no
> grammar rule to support the notion( and will follow with aa NoViableAlt
> exception). If the rules are reversed, it will not recognize any of the
> wchar+ as operators.
>
> So if I try to parse something like:
> SO=(BY THE AIRPORT) , then it works fine, but if I try SO=(NEAR THE AIRPORT)
> it throws the exception, trying to force "NEAR" into the role of operator,
> even if the grammar does not support the idea of an operator at the
> beginning of a phrase.
Lexing occurs independently of parsing so parser context does not
influence which tokens are matched.
See http://www.antlr.org/wiki/pages/viewpage.action?pageId=1741 for
the two possible solutions.

Tom.
>
> Here is my complete grammar:
>
> grammar WQL;
>
> options{
>     output=AST;
>     ASTLabelType=CommonTree;
> }
>
> tokens{ TAG; VALUE; TERMS;} //imaginary token types
>
> @header{
> import java.util.HashMap ;
> }
>
> @members {
>
> HashMap fieldMap = new HashMap();
>
> }
>
>
>
>
>
> start   : ( query
> {System.out.println("AST:\n"+$query.tree.toStringTree());}
> )+
>         ;
>
>
> query   : field (BOOL_OP^ query)*
>     | LPAREN! query RPAREN! ( BOOL_OP^ query)*
>     ;
>
> field     : tag '=' LPAREN value RPAREN -> ^('=' tag value)
>     | tag '=' terms+ -> ^('=' tag terms)
>         | qid
>         ;
>
> value   :  value_ -> ^(VALUE value_) ;
>
> value_  : terms+ (operator^ value)*
>     | LPAREN! value RPAREN! ( operator^ value)*
>     ;
>
> tag    : WCHAR
>     ;
>
>  terms   : WCHAR+  -> ^(TERMS WCHAR+ )
>     | QUOTE WCHAR+ QUOTE -> ^(TERMS WCHAR+ ) // strip QUOTEs
>     ;
>
>
> qid     : '#'!DIGIT
>         ;
>
> operator: BOOL_OP|WOK_OP;
>
>
> BOOL_OP    : 'AND'|'and'|'OR'|'or'|'NOT'|'not';
> WOK_OP    :
> 'SAME'|'same'|'NEAR'('/'DIGIT+)*|'near'('/'DIGIT+)*;
> DIGIT   : ('0'..'9');
> WS      : (' '|'\t'|'\r'|'\n')+ {skip();};
> LPAREN    : '(' ;
> RPAREN    : ')' ;
> QUOTE    : '"';
> WCHAR   : ~('='|'('| ')'|'"'|' '|'\t'|'\n'|'\r'|'#')+;
>
>
> Thanks a million for the help.
>
> Ted
>
>
>