Print

Print


This looks neat. It would be fun to try to tabulate and analyze the
choices that users make at step 5 and then, when you've got a sufficient
sample, have an "I'm feeling lucky" button that gives you a search with
the fields weighted according to the most commonly selected criteria.
(Assuming the indexing in step 2 allows some sort of weighted searching,
of course). If it works, this would save users from having to make
microchoices before they get the "more like" results.

Peter


> -----Original Message-----
> From: Code for Libraries [mailto:[log in to unmask]] On
> Behalf Of Eric Lease Morgan
> Sent: Tuesday, March 22, 2005 09:01 AM
> To: [log in to unmask]
> Subject: [CODE4LIB] find similar service
>
> I have an idea for implementing a Find Similar service, and I
> would like to bounce these ideas off of y'all.
>
> One of my responsibilities as a part of the Ockham Project,
> is to demonstrate/implement a Find Similar service against
> National Science Foundation Digital Library (NSDL) content.
> Such a service will allow the user to select an item from a
> database, click a link, and the service will find other
> documents like the selected one. An age old problem.
>
> Here is how I think I might implement it:
>
> 1. Create a large (more then 500,000 item) collection of NSDL
> metadata records by harvesting the NSF OAI Repository. Visit
> the following URL to see how the collection is being set up:
>
>    http://mylibrary.ockham.org/
>
> 2. Create indexes against the collection based on things like
> subjects, formats, institutions, etc. Thus I might have an
> index of biology stuff, mathematics stuff, articles, images,
> or just about any combination thereof.
>
> 3. Searches against the index(es) return the normal suspects:
> titles, creators, descriptions, and links to the full text.
>
> 4. Searches also return links labeled Find More Like This One.
>
> 5. After clicking the Find More Like This One link, the
> record is redisplayed allowing the user to select qualities
> of the record they find desirable: title, creator, format,
> words from the description, etc.
>
> 6. The user's selection is returned to the server, the system
> does some analysis, and returns alternative searches based on
> querying an underlying dictionary, the results of a WordNet
> search, or some other semantic analysis. These returned
> alternative queries allow the user to then search the same
> index, other indexes in the system, or even external indexes
> such as Wikipidea, Google, etc.
>
> In short, this approach to find similar is... similar to the
> pearl growing technique advocated more than a decade ago when
> mediated searching was a big topic in Library Land.
>
> What do y'all think?
>
> --
> Eric Lease Morgan
> University Libraries of Notre Dame
>
> (574) 631-8604
>