[antlr-interest] Help with a parser
John B. Brodie
jbb at acm.org
Tue Aug 2 18:02:29 PDT 2011
Greetings!
On Tue, 2011-08-02 at 23:11 +0000, Scott Smith wrote:
> I assume this is the proper place to put this. I'm trying to build a parser for filters generated by SOLR (lucene.apache.org).
i think this is exactly the correct place to post questions of this
kind.
>
> Examples of valid "sentences" the parser should parse are:
>
> fq = fred
>
> fq = (fred OR bill)
>
> fq = harry:(fred OR bill)
>
> fq = (harry:fred OR jack:bill)
>
> fq = ((harry:fred OR bill) AND (jane OR marry:sally))
>
> terms can be nested to arbitrary levels. The colon really binds to the word before it (e.g., "harry:").
>
> I've listed the parser below (which doesn't work). Can someone suggest what I can do? It seems like a simple problem, but so far I haven't cracked it. I will admit that I've only been playing with Antlr the last week or so. I did play all of Scott Stanchfield's excellent videos on vimeo. But, still I'm confused.
>
> When I run the parser in antlrworks with example 3 ("fq = harry:(fred OR bill)" - no quote marks), it finds "fq = harry:(fred" and then it wants the right paren instead of expanding out the filter_expr rule. What am I missing?
I assume you are using the Interpreter in ANTLRWorks and not the
Debugger.
If so, do not do that. ANTLRWorks' interpreter has a few quirks.
I do not use ANTLRWorks so can not really say for sure but I think using
the Debugger is the perferred way to go when using ANTLRWorks.
Your grammar, with just 1 change, operates just as you expect when
Tool'd, Compiled, and Executed from the command line.
Your WS rule needs some parens to group the characters to be hidden ---
due to ANTLR's meta-operator precedence for actions vs alternatives. So
replace your WS rule with this:
WS : (' ' | '\t' | '\n' | '\r' | '\u3000') {$channel=HIDDEN; } ;
and it just works.
I can post my test driver if you want...
oh and a small nitpick. the concept of KEYWORD usually equates to a
reserved word in the language (like your AND and OR and NOT). I would
suggest that the concept of IDENTIFIER more closely matches what you
have as keyword. Just nitpicking, sorry....
Hope this helps.
-jbb
> Thanks
>
> Scott
>
> Here's the parser.
>
> grammar testGrammer;
>
> options {
> language = Java;
> }
>
> @header {
> package a.b.c;
> }
>
> @lexer::header {
> package a.b.c;
> }
>
> filter:
> 'fq' '=' filter_expr EOF
> ;
>
> term
> : KEYWORD
> | STRING
> | '(' filter_expr ')'
> ;
>
> fieldname
> : FIELDNAME? term
> ;
>
>
> filter_expr:
> fieldname (((AND | OR | NOT))? fieldname)*
> ;
>
> FIELDNAME
> : KEYWORD ':' ;
>
> AND : 'AND' | '&&' ;
> OR : 'OR' | '||' ;
> NOT : 'NOT' | '!' ;
> KEYWORD : LETTER (LETTER | NUM_CHAR | '_')*;
> STRING : '"' NONCONTROL_CHAR* '"' ;
> WS : ' ' | '\t' | '\n' | '\r' | '\u3000' {$channel=HIDDEN; } ;
>
> fragment NONCONTROL_CHAR: LETTER | NUM_CHAR | SPACE | SYMBOL;
> fragment SYMBOL: ' '..'!' | '#'..'/' | ':'..'@' | '['..'`' | '{'..'~';
> fragment LETTER: LOWER | UPPER;
> fragment LOWER: 'a'..'z';
> fragment UPPER: 'A'..'Z';
> fragment NUM_CHAR: '0'..'9';
> fragment SPACE: ' ' | '\t';
More information about the antlr-interest
mailing list