[antlr-interest] Help controlling parser decisions

Wed Jul 25 20:37:42 PDT 2007

Thanks for the input Gavin.

I'm still finding that no matter how I tweak the parser rules, it's really
the order of the lexer rules that determine how a token is evaluated,
semantic predicates be damned.

It seems if I have more than one lexer rule that share a common token, then
the rule that comes first wins all the time.

My objective is to allow the term NEAR to be entered as a boolean, except if
it begins or ends a sequence of  terms, then recognize it as a WCHAR.
TAG=APPLES NEAR ORANGES //this should parse NEAR as an OP
TAG=NEAR APPLES ORANGES//this should treat it as a WCHAR

Here is my simplified grammar. Is it easy enough for someone to recognize
the changes that need to be made to this grammar to meet that objective?
Sorry if I'm just not getting it, this has been a challenging exercise.

grammar WQL;

options{
   output=AST;
   ASTLabelType=CommonTree;
}

query :  tag '=' terms ;

tag    : WCHAR ;

terms  : WCHAR+  (OP^ WCHAR+)*  ;

OP    : 'NEAR'|'near';
WS      : (' '|'\t'|'\r'|'\n')+ {skip();};
WCHAR   : ~('='|'('| ')'|'"'|' '|'\t'|'\n'|'\r'|'#')+;

Thank you,
Ted

On 7/25/07, Gavin Lambert <antlr at mirality.co.nz> wrote:
>
> At 04:52 26/07/2007, Ted Villalba wrote:
> >The major difference between our grammars is yours does not have
> >any lexer rules for the operator NEAR, so there is no conflict.
> >Adding the BOOL_OP lexer rule back in breaks that example.
> [...]
> >query :  tag '=' keyBOOL terms+
> >       ;
> >
> >terms  : WCHAR+
> >        ;
> >
> >tag    : WCHAR
> >        ;
> >
> >keyBOOL: near
> >        ;
> >
> >near:   {input.LT(1).getText().toLowerCase().equals("near")}?
> >WCHAR
> >        ;
> >
> >BOOL_OP :  'NEAR'; //comment this out to get working
> >WS      : (' '|'\t'|'\r'|'\n')+ {skip();};
> >WCHAR   : ~('='|'('| ')'|'"'|' '|'\t'|'\n'|'\r'|'#')+;
>
> I think you need to make your parser more lenient.
>
> 1. Rename 'BOOL_OP' to 'NEAR', and don't add any other keywords to
> it -- give those their own separate lexer rules.
>
> 2. Create a parser rule 'bool_op' that accepts NEAR.
>
> 3. Remove 'terms' because it's pointless (you've already got
> 'multiple characters' at the lexing level, and 'multiple terms' at
> the 'query' level).
>
> 4. Remove the 'near' rule and the 'keyBOOL' rule (since you've got
> 'bool_op' now).
>
> 5. Wherever a bool_op can be used in a non-keyword context, add it
> as an alternative.  Presumably, this means changing 'tag' to
> "WCHAR | bool_op".
>
> You'll still end up with a NEAR token output, not a WCHAR, but it
> should match now.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20070725/5a90137b/attachment.html