[antlr-interest] Help controlling parser decisions

Ted Villalba ted.villalba at gmail.com
Tue Jul 24 16:19:06 PDT 2007


Thanks for the responses.
Seems straight forward enough to create the disambiguating semantic
predicate, but perhaps Im not starting out with the right assumptions.

If I want to accept near as a term if it begins( or ends) a sentence, then I
thought I could do something like this:

value   :  value_ -> ^(VALUE value_) ;

value_  :  keyBOOL terms* (operator^ value)*
            | LPAREN! value RPAREN! ( operator^ value)*
            ;

keyBOOL : {input.LT(1).getText().equals("NEAR")}? terms;

terms   : WCHAR+  -> ^(TERMS WCHAR+ )
           ;

But when I try to enter SO=(NEAR apples oranges), the parser no likey.
Still getting:
     line 1:5 no viable alternative at input 'NEAR'.

Am I missing an obvious puzzle piece ?
I tried instead to assume all booleans were terms and then tested each of
the terms in a similar approach, but wasn't successful yet at
differentiating, on demand, the operators from the terms.

Thank you for the help,
Ted



On 7/24/07, Thomas Brandon <tbrandonau at gmail.com> wrote:
>
> On 7/25/07, Ted Villalba <ted.villalba at gmail.com> wrote:
> > Hi,
> >
> > I have a grammar that contains tokens that are sometimes operators,
> > sometimes not, depending on the context. The set of operators overlaps
> with
> > the set of all words that can be acceptable tokens. Trouble is,
> depending on
> > the order of my lexer rules,  the parser will recognize all such tokens
> (AND
> > , OR ,NEAR) as operators, or will recognize none of them as operators.
> >
> > So if my lexer rules are:
> > BOOL_OP    : 'AND'|'and'|'OR'|'or'|'NOT'|'not';
> > WOK_OP    :
> > 'SAME'|'same'|'NEAR'('/'DIGIT+)*|'near'('/'DIGIT+)*;
> > ...
> > WCHAR   : ~('='|'('| ')'|'"'|' '|'\t'|'\n'|'\r'|'#')+;
> >
> > In this order, if any of the tokens from the first 2 rules are
> encountered,
> > the parser assumes the token to be an operator, even where there is no
> > grammar rule to support the notion( and will follow with aa NoViableAlt
> > exception). If the rules are reversed, it will not recognize any of the
> > wchar+ as operators.
> >
> > So if I try to parse something like:
> > SO=(BY THE AIRPORT) , then it works fine, but if I try SO=(NEAR THE
> AIRPORT)
> > it throws the exception, trying to force "NEAR" into the role of
> operator,
> > even if the grammar does not support the idea of an operator at the
> > beginning of a phrase.
> Lexing occurs independently of parsing so parser context does not
> influence which tokens are matched.
> See http://www.antlr.org/wiki/pages/viewpage.action?pageId=1741 for
> the two possible solutions.
>
> Tom.
> >
> > Here is my complete grammar:
> >
> > grammar WQL;
> >
> > options{
> >     output=AST;
> >     ASTLabelType=CommonTree;
> > }
> >
> > tokens{ TAG; VALUE; TERMS;} //imaginary token types
> >
> > @header{
> > import java.util.HashMap ;
> > }
> >
> > @members {
> >
> > HashMap fieldMap = new HashMap();
> >
> > }
> >
> >
> >
> >
> >
> > start   : ( query
> > {System.out.println("AST:\n"+$query.tree.toStringTree());}
> > )+
> >         ;
> >
> >
> > query   : field (BOOL_OP^ query)*
> >     | LPAREN! query RPAREN! ( BOOL_OP^ query)*
> >     ;
> >
> > field     : tag '=' LPAREN value RPAREN -> ^('=' tag value)
> >     | tag '=' terms+ -> ^('=' tag terms)
> >         | qid
> >         ;
> >
> > value   :  value_ -> ^(VALUE value_) ;
> >
> > value_  : terms+ (operator^ value)*
> >     | LPAREN! value RPAREN! ( operator^ value)*
> >     ;
> >
> > tag    : WCHAR
> >     ;
> >
> >  terms   : WCHAR+  -> ^(TERMS WCHAR+ )
> >     | QUOTE WCHAR+ QUOTE -> ^(TERMS WCHAR+ ) // strip QUOTEs
> >     ;
> >
> >
> > qid     : '#'!DIGIT
> >         ;
> >
> > operator: BOOL_OP|WOK_OP;
> >
> >
> > BOOL_OP    : 'AND'|'and'|'OR'|'or'|'NOT'|'not';
> > WOK_OP    :
> > 'SAME'|'same'|'NEAR'('/'DIGIT+)*|'near'('/'DIGIT+)*;
> > DIGIT   : ('0'..'9');
> > WS      : (' '|'\t'|'\r'|'\n')+ {skip();};
> > LPAREN    : '(' ;
> > RPAREN    : ')' ;
> > QUOTE    : '"';
> > WCHAR   : ~('='|'('| ')'|'"'|' '|'\t'|'\n'|'\r'|'#')+;
> >
> >
> > Thanks a million for the help.
> >
> > Ted
> >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20070724/e3538a8f/attachment-0001.html 


More information about the antlr-interest mailing list