[antlr-interest] org.apache.jakarta\lucene

Thu May 27 13:16:47 PDT 2004

Hi All,
  I'm faced with the prospect of text searching COBOL, PL/I, Java, etc...
files.  The infrastructure has a lucene setup but no specific TokenStream
implementations.  As noted here earlier, Lucene has a TokenStream, as well.

1. How can an Antlr lexer be used to fulfill the lucene requirement for a
TokenStream?
2. Generally, why would I want to tokenize differently for different
languages? how does that effect my results?
3.  What are the issues with comments?  Do I have to decide to
search/eliminate them?

Thanks, in advance for your help.

Jim

P.S. Did I hear something about an Antlr Workshop?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20040527/e8e7e09a/attachment.html