[antlr-interest] org.apache.jakarta\lucene
Jim O'Connor
Jim.OConnor at microfocus.com
Thu May 27 13:16:47 PDT 2004
Hi All,
I'm faced with the prospect of text searching COBOL, PL/I, Java, etc...
files. The infrastructure has a lucene setup but no specific TokenStream
implementations. As noted here earlier, Lucene has a TokenStream, as well.
1. How can an Antlr lexer be used to fulfill the lucene requirement for a
TokenStream?
2. Generally, why would I want to tokenize differently for different
languages? how does that effect my results?
3. What are the issues with comments? Do I have to decide to
search/eliminate them?
Thanks, in advance for your help.
Jim
P.S. Did I hear something about an Antlr Workshop?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20040527/e8e7e09a/attachment.html
More information about the antlr-interest
mailing list