[antlr-interest] Lucene grammar

Sun Jan 18 13:01:42 PST 2009

Hi. you should be able to almost directly translate the javacc version  
of their career language grammar to ANTLR's format.  That said, the  
previous query parser was terrible.  You'll probably have to pass all  
words to the parser and let it figure out what to do or assume  
everything is just a word except for "ID:" type stuff. return as a  
special token.

Ter
On Jan 18, 2009, at 12:59 PM, Dennis Brothers wrote:

> I'm in the process of creating a parser for Lucene query syntax.  I've
> done a lot of searching, and can't find any useful "prior art".  Is
> anyone aware of a Lucene parser built with ANTLR?
>
> A specific problem I see is that Lucene involves queries of the form
> foo:bar (and there might be whitespace either side of the colon).
> This means "find a record whose foo field contains the word 'bar'".
> To complicate things further, the field name and colon are optional -
> there's a default field.
>
> I'd like to distinquish field names from words in the lexer, but I
> don't see a simple way to do it.  Can I somehow use a syntactic
> predicate in the lexer?  Or a semantic predicate that scans ahead for
> the colon?  In either case, how do I deal with the optional whitespace
> in the lexer?  Would the traditional whitespace-skipping constructs
> take effect before the predicate was tested?
>
> Thanks for any insight -
>     - Dennis Brothers
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address