"generate artificial serendipity"
Now my motto!
On Wed, Feb 16, 2011 at 11:02 AM, Simon Spero <[log in to unmask]> wrote:
> There's another source of data for training library relevance ranking that I
> don't think has been exploited much yet.
> (for academic libraries)
> Searches against catalogs are usually intended to locate material to fill a
> specific information need.
> Often this information seeking results in circulation events.
> Many systems can identify the person who conducted a search session.
> Comparing the search results to actual checkout events might be fruitful.
> For example, if a search for certain keywords resulted in checkout events
> for items other than those listed, but within shelf browsing distance,
> there may be a strong relationship between the words and the information
> need satisfied by those concepts.
> Incidentally, this is the kind of association that would me much easier to
> find if the LCSH hierarchy hadn't been so badly mangled by computer. If the
> hierarchy were intact it would be possible to aggregate subjects to deal
> with the sparseness of the circulation events.
> Note that getting hold of this data may require working with central IT
> (e.g. If the library only has ip addresses, and the holder of that ip
> address at that time is known only via dhcp logs; or if the computers in the
> library require login, those logs may not be accessible to library systems
> staff directly.) This kind of work should also go through the IRB, even if
> their approval is not explicitly required. They may have good ideas for
> avoiding possible privacy violations.
> If the physical layout of the library is known, you could also estimate scan
> radius. You could also calculate, based on checkouts of items seemingly
> unrelated to the search, from shelves passed on the way to the elevator, how
> to generate artificial serendipity by randomly throwing a few such items
> into the search results.
The Cherry Hill Company