[antlr-interest] Help with a parser
Scott Smith
ssmith at mainstreamdata.com
Tue Aug 2 16:11:19 PDT 2011
I assume this is the proper place to put this. I'm trying to build a parser for filters generated by SOLR (lucene.apache.org).
Examples of valid "sentences" the parser should parse are:
fq = fred
fq = (fred OR bill)
fq = harry:(fred OR bill)
fq = (harry:fred OR jack:bill)
fq = ((harry:fred OR bill) AND (jane OR marry:sally))
terms can be nested to arbitrary levels. The colon really binds to the word before it (e.g., "harry:").
I've listed the parser below (which doesn't work). Can someone suggest what I can do? It seems like a simple problem, but so far I haven't cracked it. I will admit that I've only been playing with Antlr the last week or so. I did play all of Scott Stanchfield's excellent videos on vimeo. But, still I'm confused.
When I run the parser in antlrworks with example 3 ("fq = harry:(fred OR bill)" - no quote marks), it finds "fq = harry:(fred" and then it wants the right paren instead of expanding out the filter_expr rule. What am I missing?
Thanks
Scott
Here's the parser.
grammar testGrammer;
options {
language = Java;
}
@header {
package a.b.c;
}
@lexer::header {
package a.b.c;
}
filter:
'fq' '=' filter_expr EOF
;
term
: KEYWORD
| STRING
| '(' filter_expr ')'
;
fieldname
: FIELDNAME? term
;
filter_expr:
fieldname (((AND | OR | NOT))? fieldname)*
;
FIELDNAME
: KEYWORD ':' ;
AND : 'AND' | '&&' ;
OR : 'OR' | '||' ;
NOT : 'NOT' | '!' ;
KEYWORD : LETTER (LETTER | NUM_CHAR | '_')*;
STRING : '"' NONCONTROL_CHAR* '"' ;
WS : ' ' | '\t' | '\n' | '\r' | '\u3000' {$channel=HIDDEN; } ;
fragment NONCONTROL_CHAR: LETTER | NUM_CHAR | SPACE | SYMBOL;
fragment SYMBOL: ' '..'!' | '#'..'/' | ':'..'@' | '['..'`' | '{'..'~';
fragment LETTER: LOWER | UPPER;
fragment LOWER: 'a'..'z';
fragment UPPER: 'A'..'Z';
fragment NUM_CHAR: '0'..'9';
fragment SPACE: ' ' | '\t';
More information about the antlr-interest
mailing list