[antlr-interest] org.apache.jakarta\lucene

Thu May 27 13:29:52 PDT 2004

On May 27, 2004, at 1:16 PM, Jim O'Connor wrote:
> 1. How can an Antlr lexer be used to fulfill the lucene requirement 
> for a TokenStream?
I'm not familiar with that, but I imagine you'd just keep grabbing 
nextToken from the lexer and providing the text to lucene.

>
> 2. Generally, why would I want to tokenize differently for different 
> languages? how does that effect my results?
I guess it depends on semantics.  Are you going to provide an index for 
variable names only?  Do you want to not search comments?  Different 
languages will have different punctuation.  For instance you might want 
some-valid-id as a complete token but not 5-3 as a complete token.

> 3.  What are the issues with comments?  Do I have to decide to 
> search/eliminate them?
Who else would decide?

> P.S. Did I hear something about an Antlr Workshop?
In the works late July or early August.  Details will be forthcoming on 
the mailing list.

Monty Zukowski

ANTLR & Java Consultant -- http://www.codetransform.com
ANSI C/GCC transformation toolkit -- 
http://www.codetransform.com/gcc.html
Embrace the Decay -- http://www.codetransform.com/EmbraceDecay.html

Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/