This looks neat. It would be fun to try to tabulate and analyze the choices that users make at step 5 and then, when you've got a sufficient sample, have an "I'm feeling lucky" button that gives you a search with the fields weighted according to the most commonly selected criteria. (Assuming the indexing in step 2 allows some sort of weighted searching, of course). If it works, this would save users from having to make microchoices before they get the "more like" results. Peter > -----Original Message----- > From: Code for Libraries [mailto:[log in to unmask]] On > Behalf Of Eric Lease Morgan > Sent: Tuesday, March 22, 2005 09:01 AM > To: [log in to unmask] > Subject: [CODE4LIB] find similar service > > I have an idea for implementing a Find Similar service, and I > would like to bounce these ideas off of y'all. > > One of my responsibilities as a part of the Ockham Project, > is to demonstrate/implement a Find Similar service against > National Science Foundation Digital Library (NSDL) content. > Such a service will allow the user to select an item from a > database, click a link, and the service will find other > documents like the selected one. An age old problem. > > Here is how I think I might implement it: > > 1. Create a large (more then 500,000 item) collection of NSDL > metadata records by harvesting the NSF OAI Repository. Visit > the following URL to see how the collection is being set up: > > http://mylibrary.ockham.org/ > > 2. Create indexes against the collection based on things like > subjects, formats, institutions, etc. Thus I might have an > index of biology stuff, mathematics stuff, articles, images, > or just about any combination thereof. > > 3. Searches against the index(es) return the normal suspects: > titles, creators, descriptions, and links to the full text. > > 4. Searches also return links labeled Find More Like This One. > > 5. After clicking the Find More Like This One link, the > record is redisplayed allowing the user to select qualities > of the record they find desirable: title, creator, format, > words from the description, etc. > > 6. The user's selection is returned to the server, the system > does some analysis, and returns alternative searches based on > querying an underlying dictionary, the results of a WordNet > search, or some other semantic analysis. These returned > alternative queries allow the user to then search the same > index, other indexes in the system, or even external indexes > such as Wikipidea, Google, etc. > > In short, this approach to find similar is... similar to the > pearl growing technique advocated more than a decade ago when > mediated searching was a big topic in Library Land. > > What do y'all think? > > -- > Eric Lease Morgan > University Libraries of Notre Dame > > (574) 631-8604 >