I've been asked to create a "smart search" a.k.a "be like google without any
of the resources" ;) for our upcoming new library home page. One of the
things they're looking for is a "did you mean?" -- but plugging into the
google api only nets me 1,000 hits a day. Now, granted, I don't thing our
library home page will suffer that much from _only_ 1,000 hits, but I'd
rather build something that can scale than not.
If it's a web service, it'd be a snap to integrate into my app - so I'm very
I did see something on #code4lib about an ockham server?
Would/should that be part of this?
University of Iowa Libraries
319 335 9152
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Eric
Sent: Tuesday, September 13, 2005 8:45 AM
To: [log in to unmask]
Subject: [CODE4LIB] spelling server
What do y'all think of the idea of a spelling server -- a Web service taking
a word as input and returning a list of alternative spellings.
MyLibrary@Ockham has indexed about 430,000 OAI records. These records have
grossly classified into a number of domains such as mathematics, life
science, theses & dissertations, and a master domain consisting of all the
Taking a hint from Bill Mosely (of swish-e fame), I have read the indexes,
parsed out the individual words, and fed them to GNU ASPELL, a dictionary
program. It is then possible to query ASPELL and have it return alternative
spellings. We have incorporated this feature into [log in to unmask]
I could make this spell checking functionality available as a Web service.
The URL could look something like this:
The output could look something like this:
It would then be up to the client to do with the content of the spelling
elements as they desired. For example, the client could:
* spell check a document
* implement a Did You Mean? service a la Google
* incorporate the results into a Find More Like This One search
* enhance the results of an OPAC search
* feed selected words back to the spelling server
Alternative URL's might include:
Writing the underlying script would be easy. Articulating a XML stream as
output would be harder.
What do y'all thinque? It would be fun at the very least.
Eric Lease Morgan
University Libraries of Notre Dame