I have an idea for implementing a Find Similar service, and I would like to bounce these ideas off of y'all. One of my responsibilities as a part of the Ockham Project, is to demonstrate/implement a Find Similar service against National Science Foundation Digital Library (NSDL) content. Such a service will allow the user to select an item from a database, click a link, and the service will find other documents like the selected one. An age old problem. Here is how I think I might implement it: 1. Create a large (more then 500,000 item) collection of NSDL metadata records by harvesting the NSF OAI Repository. Visit the following URL to see how the collection is being set up: http://mylibrary.ockham.org/ 2. Create indexes against the collection based on things like subjects, formats, institutions, etc. Thus I might have an index of biology stuff, mathematics stuff, articles, images, or just about any combination thereof. 3. Searches against the index(es) return the normal suspects: titles, creators, descriptions, and links to the full text. 4. Searches also return links labeled Find More Like This One. 5. After clicking the Find More Like This One link, the record is redisplayed allowing the user to select qualities of the record they find desirable: title, creator, format, words from the description, etc. 6. The user's selection is returned to the server, the system does some analysis, and returns alternative searches based on querying an underlying dictionary, the results of a WordNet search, or some other semantic analysis. These returned alternative queries allow the user to then search the same index, other indexes in the system, or even external indexes such as Wikipidea, Google, etc. In short, this approach to find similar is... similar to the pearl growing technique advocated more than a decade ago when mediated searching was a big topic in Library Land. What do y'all think? -- Eric Lease Morgan University Libraries of Notre Dame (574) 631-8604