[antlr-interest] org.apache.jakarta\lucene

Jim O'Connor Jim.OConnor at microfocus.com
Thu May 27 13:16:47 PDT 2004


Hi All,
  I'm faced with the prospect of text searching COBOL, PL/I, Java, etc...
files.  The infrastructure has a lucene setup but no specific TokenStream
implementations.  As noted here earlier, Lucene has a TokenStream, as well.

 
 
1. How can an Antlr lexer be used to fulfill the lucene requirement for a
TokenStream?
2. Generally, why would I want to tokenize differently for different
languages? how does that effect my results?
3.  What are the issues with comments?  Do I have to decide to
search/eliminate them?
 
Thanks, in advance for your help.
 
Jim
 
P.S. Did I hear something about an Antlr Workshop?
 
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20040527/e8e7e09a/attachment.html


More information about the antlr-interest mailing list