[antlr-interest] Lucene grammar

Dennis Brothers brothers at bros.com
Sun Jan 18 12:59:31 PST 2009


I'm in the process of creating a parser for Lucene query syntax.  I've  
done a lot of searching, and can't find any useful "prior art".  Is  
anyone aware of a Lucene parser built with ANTLR?

A specific problem I see is that Lucene involves queries of the form  
foo:bar (and there might be whitespace either side of the colon).   
This means "find a record whose foo field contains the word 'bar'".   
To complicate things further, the field name and colon are optional -  
there's a default field.

I'd like to distinquish field names from words in the lexer, but I  
don't see a simple way to do it.  Can I somehow use a syntactic  
predicate in the lexer?  Or a semantic predicate that scans ahead for  
the colon?  In either case, how do I deal with the optional whitespace  
in the lexer?  Would the traditional whitespace-skipping constructs  
take effect before the predicate was tested?

Thanks for any insight -
     - Dennis Brothers



More information about the antlr-interest mailing list