[antlr-interest] Lucene grammar

Sun Jan 18 23:08:24 PST 2009

At 09:59 19/01/2009, Dennis Brothers wrote:
 >A specific problem I see is that Lucene involves queries of the
 >form foo:bar (and there might be whitespace either side of the
 >colon).
[...]
 >I'd like to distinquish field names from words in the lexer,
 >but I don't see a simple way to do it.  Can I somehow use a
 >syntactic predicate in the lexer?  Or a semantic predicate
 >that scans ahead for the colon?

Yes, you *could* do that, but it's not really a good idea.

Don't try to do too much work in the lexer -- just get it to 
consolidate groups of letters/numbers/etc into generic IDs or 
WORDs or whatever and then figure out what they mean in the 
parser.

If you're generating an AST, you can change the type of the token 
in the output AST once you know more about the context in which it 
is used.