Print

Print


A couple of points about Lucene features, in reply to Karen: Lucene does
do stemming natively, in the analyser (use the PorterStemFilter class,
which is part of the Lucene distribution). Fuzzy searching can be
bizarre if you use the default level, but you can control the degree of
desired fuzziness: if you're using the query parser, put a number
between 0 and 1 after the tilde (e.g. "horse~0.6" is fuzzier than
"horse~0.9"). I find 0.75 is about right (but I'm indexing raw
multilingual OCR, so bizarre is good). For those who haven't seen it
there's more on query syntax here:
http://lucene.apache.org/java/docs/queryparsersyntax.html .

I agree that what were once bells and whistles are now essential to the
kind of search interface we need to build. My sense is that Lucene has
the machinery to do just about everything you need in a good modern
search interface, but it's up to the implementation to put all the
pieces together.

Peter

-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
K.G. Schneider
Sent: Tuesday, May 30, 2006 9:16 PM
To: [log in to unmask]
Subject: Re: [CODE4LIB] fun with kinosearch

> I'm not sure where stemming comes in (does Lucene do this?), it seems
> faceted browsing could be handled by something like Carrot2.  Rumor
> has it Solr has faceting support somewhere, as well.  At least,
> according to the 9s project.  http://www.nines.org/
>
> -Ross.

Lucene doesn't have native stemming; it does do fuzzy search, but you
don't want that. (Trussssssst me, through a funky series of events I
recently evaluated Lucene with fuzzy search enabled, and it was
bizarre.) Lucene is used as a building block for other search engines.
It does support quite a few capabilities. I have seen it used in
conjunction with the Porter stemming algorithm and with spell-checkers
of various flavors.

But again--and probably only because I have been testing search engines
for several months and am starting to get a little cabin fever--I want
to clarify that I'm not piling on the fact that Kino can't do it all. As
a component, it could be great, and that it's in a Perl is a biggy. I
was
(awkwardly) addressing my concern that some fundamental search
capabilities appeared to have been labeled "creeping featureitis." I
would just be careful about that kind of terminology. I doubt Eric meant
anything seriously by it. I just know the long uphill battle it can be
to provide quality search, and I wouldn't want someone as distinguished
as Eric quoted in support of compromising the user experience.

K.G. Schneider
[log in to unmask]