[antlr-interest] Help with a parser
Scott Smith
ssmith at mainstreamdata.com
Tue Aug 2 21:44:49 PDT 2011
Thanks so much. I guess I was looking for something complicated in how I wrote the EBNF. I used the same parser (with your modification) on a much more complicated example and it also worked. So, this was very helpful.
I will look at your other suggestion as well. I like people with an opinion (nitpicky or not)!! I appreciate your quick response.
Thanks again
I also appreciate Kirby taking the time to make suggestions. While his "suspicion" wasn't correct, I still appreciate the comments as it will help me learn.
Cheers to you both. I'll get back to trying to understand Mr. Parr's books. :-)
Scott
-----Original Message-----
From: John B. Brodie [mailto:jbb at acm.org]
Sent: Tuesday, August 02, 2011 7:02 PM
To: Scott Smith
Cc: antlr-interest at antlr.org
Subject: Re: [antlr-interest] Help with a parser
Greetings!
On Tue, 2011-08-02 at 23:11 +0000, Scott Smith wrote:
> I assume this is the proper place to put this. I'm trying to build a parser for filters generated by SOLR (lucene.apache.org).
i think this is exactly the correct place to post questions of this kind.
>
> Examples of valid "sentences" the parser should parse are:
>
> fq = fred
>
> fq = (fred OR bill)
>
> fq = harry:(fred OR bill)
>
> fq = (harry:fred OR jack:bill)
>
> fq = ((harry:fred OR bill) AND (jane OR marry:sally))
>
> terms can be nested to arbitrary levels. The colon really binds to the word before it (e.g., "harry:").
>
> I've listed the parser below (which doesn't work). Can someone suggest what I can do? It seems like a simple problem, but so far I haven't cracked it. I will admit that I've only been playing with Antlr the last week or so. I did play all of Scott Stanchfield's excellent videos on vimeo. But, still I'm confused.
>
> When I run the parser in antlrworks with example 3 ("fq = harry:(fred OR bill)" - no quote marks), it finds "fq = harry:(fred" and then it wants the right paren instead of expanding out the filter_expr rule. What am I missing?
I assume you are using the Interpreter in ANTLRWorks and not the Debugger.
If so, do not do that. ANTLRWorks' interpreter has a few quirks.
I do not use ANTLRWorks so can not really say for sure but I think using the Debugger is the perferred way to go when using ANTLRWorks.
Your grammar, with just 1 change, operates just as you expect when Tool'd, Compiled, and Executed from the command line.
Your WS rule needs some parens to group the characters to be hidden --- due to ANTLR's meta-operator precedence for actions vs alternatives. So replace your WS rule with this:
WS : (' ' | '\t' | '\n' | '\r' | '\u3000') {$channel=HIDDEN; } ;
and it just works.
I can post my test driver if you want...
oh and a small nitpick. the concept of KEYWORD usually equates to a reserved word in the language (like your AND and OR and NOT). I would suggest that the concept of IDENTIFIER more closely matches what you have as keyword. Just nitpicking, sorry....
Hope this helps.
-jbb
> Thanks
>
> Scott
>
> Here's the parser.
>
> grammar testGrammer;
>
> options {
> language = Java;
> }
>
> @header {
> package a.b.c;
> }
>
> @lexer::header {
> package a.b.c;
> }
>
> filter:
> 'fq' '=' filter_expr EOF
> ;
>
> term
> : KEYWORD
> | STRING
> | '(' filter_expr ')'
> ;
>
> fieldname
> : FIELDNAME? term
> ;
>
>
> filter_expr:
> fieldname (((AND | OR | NOT))? fieldname)*
> ;
>
> FIELDNAME
> : KEYWORD ':' ;
>
> AND : 'AND' | '&&' ;
> OR : 'OR' | '||' ;
> NOT : 'NOT' | '!' ;
> KEYWORD : LETTER (LETTER | NUM_CHAR | '_')*; STRING : '"'
> NONCONTROL_CHAR* '"' ; WS : ' ' | '\t' | '\n' | '\r' | '\u3000'
> {$channel=HIDDEN; } ;
>
> fragment NONCONTROL_CHAR: LETTER | NUM_CHAR | SPACE | SYMBOL; fragment
> SYMBOL: ' '..'!' | '#'..'/' | ':'..'@' | '['..'`' | '{'..'~';
> fragment LETTER: LOWER | UPPER; fragment LOWER: 'a'..'z'; fragment
> UPPER: 'A'..'Z'; fragment NUM_CHAR: '0'..'9'; fragment SPACE: ' ' |
> '\t';
More information about the antlr-interest
mailing list