I have an idea for implementing a Find Similar service, and I would
like to bounce these ideas off of y'all.

One of my responsibilities as a part of the Ockham Project, is to
demonstrate/implement a Find Similar service against National Science
Foundation Digital Library (NSDL) content. Such a service will allow
the user to select an item from a database, click a link, and the
service will find other documents like the selected one. An age old

Here is how I think I might implement it:

1. Create a large (more then 500,000 item) collection of NSDL metadata
records by harvesting the NSF OAI Repository. Visit the following URL
to see how the collection is being set up:

2. Create indexes against the collection based on things like subjects,
formats, institutions, etc. Thus I might have an index of biology
stuff, mathematics stuff, articles, images, or just about any
combination thereof.

3. Searches against the index(es) return the normal suspects: titles,
creators, descriptions, and links to the full text.

4. Searches also return links labeled Find More Like This One.

5. After clicking the Find More Like This One link, the record is
redisplayed allowing the user to select qualities of the record they
find desirable: title, creator, format, words from the description,

6. The user's selection is returned to the server, the system does some
analysis, and returns alternative searches based on querying an
underlying dictionary, the results of a WordNet search, or some other
semantic analysis. These returned alternative queries allow the user to
then search the same index, other indexes in the system, or even
external indexes such as Wikipidea, Google, etc.

In short, this approach to find similar is... similar to the pearl
growing technique advocated more than a decade ago when mediated
searching was a big topic in Library Land.

What do y'all think?

Eric Lease Morgan
University Libraries of Notre Dame

(574) 631-8604