Print

Print


Sweet! 

I've been asked to create a "smart search" a.k.a "be like google without any
of the resources" ;) for our upcoming new library home page.  One of the
things they're looking for is a "did you mean?" -- but plugging into the
google api only nets me 1,000 hits a day.  Now, granted, I don't thing our
library home page will suffer that much from _only_ 1,000 hits, but I'd
rather build something that can scale than not.

If it's a web service, it'd be a snap to integrate into my app - so I'm very
interested =).

I did see something on #code4lib about an ockham server?  
Would/should that be part of this?

Andrew

Andrew Forman
University of Iowa Libraries
ISST Development
319 335 9152

-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Eric
Lease Morgan
Sent: Tuesday, September 13, 2005 8:45 AM
To: [log in to unmask]
Subject: [CODE4LIB] spelling server

What do y'all think of the idea of a spelling server -- a Web service taking
a word as input and returning a list of alternative spellings.

MyLibrary@Ockham has indexed about 430,000 OAI records. These records have
grossly classified into a number of domains such as mathematics, life
science, theses & dissertations, and a master domain consisting of all the
sub domains.

Taking a hint from Bill Mosely (of swish-e fame), I have read the indexes,
parsed out the individual words, and fed them to GNU ASPELL, a dictionary
program. It is then possible to query ASPELL and have it return alternative
spellings. We have incorporated this feature into [log in to unmask]

I could make this spell checking functionality available as a Web service.
The URL could look something like this:

   http://spell.ockham.org/?word=origami

The output could look something like this:

<?xml version='1.0'?>
<spell>
     <word>origami</word>
     <spellings>
         <spelling>origem</spelling>
         <spelling>irrigam</spelling>
         <spelling>obrigam</spelling>
         <spelling>kirigami</spelling>
         <spelling>ariguama</spelling>
     </spellings>
</spell>

It would then be up to the client to do with the content of the spelling
elements as they desired. For example, the client could:

   * spell check a document
   * implement a Did You Mean? service a la Google
   * incorporate the results into a Find More Like This One search
   * enhance the results of an OPAC search
   * feed selected words back to the spelling server

Alternative URL's might include:

   http://spell.ockham.org/?word=origami&domain=master
   http://spell.ockham.org/?word=origami&domain=master&version=1.0
   http://spell.ockham.org/?
word=origami&domain=master&version=1.0&verbosity=5

Writing the underlying script would be easy. Articulating a XML stream as
output would be harder.

What do y'all thinque? It would be fun at the very least.

--
Eric Lease Morgan
University Libraries of Notre Dame

(574) 631-8604