I've been asked to create a "smart search" a.k.a "be like google without any
of the resources" ;) for our upcoming new library home page.  One of the
things they're looking for is a "did you mean?" -- but plugging into the
google api only nets me 1,000 hits a day.  Now, granted, I don't thing our
library home page will suffer that much from _only_ 1,000 hits, but I'd
rather build something that can scale than not.

If it's a web service, it'd be a snap to integrate into my app - so I'm very
interested =).

I did see something on #code4lib about an ockham server?  
Would/should that be part of this?


Andrew Forman
University of Iowa Libraries
ISST Development
319 335 9152

-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Eric
Lease Morgan
Sent: Tuesday, September 13, 2005 8:45 AM
To: [log in to unmask]
Subject: [CODE4LIB] spelling server

What do y'all think of the idea of a spelling server -- a Web service taking
a word as input and returning a list of alternative spellings.

MyLibrary@Ockham has indexed about 430,000 OAI records. These records have
grossly classified into a number of domains such as mathematics, life
science, theses & dissertations, and a master domain consisting of all the
sub domains.

Taking a hint from Bill Mosely (of swish-e fame), I have read the indexes,
parsed out the individual words, and fed them to GNU ASPELL, a dictionary
program. It is then possible to query ASPELL and have it return alternative
spellings. We have incorporated this feature into [log in to unmask]

I could make this spell checking functionality available as a Web service.
The URL could look something like this:

The output could look something like this:

<?xml version='1.0'?>

It would then be up to the client to do with the content of the spelling
elements as they desired. For example, the client could:

   * spell check a document
   * implement a Did You Mean? service a la Google
   * incorporate the results into a Find More Like This One search
   * enhance the results of an OPAC search
   * feed selected words back to the spelling server

Alternative URL's might include:

Writing the underlying script would be easy. Articulating a XML stream as
output would be harder.

What do y'all thinque? It would be fun at the very least.

Eric Lease Morgan
University Libraries of Notre Dame

(574) 631-8604