Hi, Alberto.

We haven't rolled it out yet, but at University of Virginia we've had
great success with solr/lucene. You can take a look at what we've
done here:

Solr itself meets all the criteria you just listed, and Erik Hatcher
wrote some very nice code for us (and it's open source, although I'm
not sure if it's actually in a publicly accessible repository right
now.) to import our Marc records. We currently have about 4 million
marc records, plus EAD, TEI, and GDMS XML files, plus some HTML files
that Erik indexed as a proof of concept. The front end is Ruby on
Rails, and the mapping files for cross-walking the XML or Marc into
solr is very easy to configure, even for a non-programmer. In fact,
we are planning to just hand these files to our cataloguers and
letting them hash things out themselves.

I'm currently working on a formal deployment plan that we have to
have before we can make this a production service, and this will
include workflows for syncing the data with our Sirsi ILS. If this is
interesting and you want more info, I or other folks at UVa would be
happy to talk with you about it. Or if you need to roll your own
system, I would highly recommend solr. It makes lucene very easy to
work with and provides all kinds of added functionality.


On Jul 18, 2007, at 10:45 AM, Alberto Accomazzi wrote:

> Our project is looking to transition to a new search engine to handle
> our bibliographic databases (5.5M records of bibliographic article
> metadata + 0.6M fulltext articles).  What we are looking for is
> something easily tweakable, which offers fielded searches,
> boolean/simple search logic, customizable relevance ranking,
> proximity,
> highlighting, synonym/stemming matching.  Needs to run on a linux
> 64-bit
> box.  The packages I am aware of are:
> 1. lucene/clucene/lucy
> 2. kinosearch
> 3. xapian
> 4. zebra
> 5. invenio
> Am I missing any from the list?  Are any of these to be excluded based
> on our requirements?  I'd like to hear experiences from people who are
> using or have used these packages.
> -- Alberto
> ********************************************************************
> Dr. Alberto Accomazzi                  aaccomazzi(at)cfa harvard edu
> NASA Astrophysics Data System              
> Harvard-Smithsonian Center for Astrophysics
> 60 Garden St, MS 67, Cambridge, MA 02138, USA
> ********************************************************************

Elizabeth (Bess) Sadler
Head, Technical and Metadata Services
Digital Scholarship Services
Box 400129
Alderman Library
University of Virginia
Charlottesville, VA 22904

[log in to unmask]
(434) 243-2305