LISTSERV 16.5 - CODE4LIB Archives

I did not have stellar results experimenting with a similar approach to
Eric's.  The crawler we use is from Thunderstone, and it does a fine job
of indexing web content with very nice relevancy ranking and "did you
mean" spell-check.  What I found when trying to let it loose against
multiple servers is, when it hits our OPAC, it sees several different
formats per record and ends up more than triple-indexing each title.  It
does have a lot of flexibility in the indexing options, though, so I
could try it again and set it to ignore URL patterns that refer to the
MARC display, etc.  Still, the total cost to index a couple million
pages (which would be needed in order to include all the records in the
OPAC plus the website pages plus the Syndetics added content) is a bit
of a steep one-time outlay.  I'm sure there's some other way to go about
this with Thunderstone's TEXIS rather than using their Webinator
product, but then you have a substantially higher development effort, I
think.  

Thunderstone now makes a faceted search (they call it parametric
search).  They also make search appliances at different capacity levels.
 Pricing is really pretty reasonable for what you get.
http://www.thunderstone.com/texis/site/pages/Products.html
 
 
 
Genny Engel
Internet Librarian
Sonoma County Library
[log in to unmask]
707 545-0831 x581
www.sonomalibrary.org
 


>>> [log in to unmask] 07/11/08 07:36AM >>>
> In short, I think a Google Appliance is an expensive but viable
option.
Relative to other commercial products in the space, the GA or G-mini
is
actually very inexpensive.  Another option to add to Eric's list is
the
All Access Connector which adds MuseGlobal's fed search technology to
the Google appliance.  Of course, it also add $40K or more to the
total
price.
http://wire.jstirnaman.com/2008/05/23/federated-search-for-google-search-appliance/


Jason
-- 

Jason Stirnaman
Digital Projects Librarian/School of Medicine Support
A.R. Dykes Library, University of Kansas Medical Center
[log in to unmask] 
913-588-7319


>>> On 7/10/2008 at 10:25 PM, in message
<[log in to unmask]>, Eric Lease Morgan
<[log in to unmask]> wrote:
> At the risk of interpreting the original question incorrectly, we  
> have had decent success using the Google Search Appliance to  
> facilitate search across the enterprise (university):
> 
>    * Buy the Appliance.
>    * Feed it one or more URLs.
>    * Wait for it to crawl.
>    * Customize the user interface.
>    * Allow people to use it.
> 
> While we haven't done so, it would not be too difficult to implement


> a sort of federated search within the Appliance's interface. This  
> could be done in a number of ways:
> 
>    1. Acquire bibliographic data and
>       feed to directly to the Appliance
>       via the (poorly) documented SQL
>       interface.
> 
>    2. Acquire bibliographic data, save
>       it as HTML files, and allow the
>       Appliance to crawl the HTML.
> 
>    3. License access to bibliographic
>       making sure it is accessible through
>       some sort of API, and write a Google
>       OneBox module that queries the data,
>       and returns results as a part of a
>       normal Google Appliance search.
> 
> The larger Google Appliance costs about $30,000 but you purchase it,


> not license it. No annual fees. That will buy you the ability to  
> index 500,000 documents. When it comes to a bibliographic database  
> (such as a subject index or a library catalog) that is not really  
> very much.
> 
> We here at Notre Dame did implement Option #3, but it queries the  
> local LDAP sever to return names and addresses of people, not  
> bibliographic citations. [1, 2] I did write a OneBox module to query


> our catalog, but we haven't implemented it, yet. It will probably  
> appear as a part of the library's Search This Site functionality.
> 
> In short, I think a Google Appliance is an expensive but viable
option.
> 
> [1] Search for a name (ex: Hesburgh) at http://search.nd.edu/ 
> [2] OneBox source code - http://tinyurl.com/6ktxot