A bunch of us are using Solr/lucene for discovery over library bibliographic records, which is based on the basic tf*idf weighting type algorithm, with a bunch of tweaks. So all of us doing that, and finding it pretty successful, are probably surprised to hear that this approach won't work on library data. :) Jonathan On 2/15/2011 4:13 PM, Dave Caroline wrote: > I wrote my own search engine for my system and thought long and hard > about relevancy, in the end went for none! and display alphabetical. > > Dave Caroline > > On Tue, Feb 15, 2011 at 8:32 PM, Till Kinstler<[log in to unmask]> wrote: >> There is a vivid discussion about relevance ranking for library >> resources in discovery interfaces in recent years. In articles, blog >> posts and presentations on this topic, again and again possible ranking >> factors are discussed beyond well known term statistic based methods >> like the vector space retrieval model with tf*idf weighting (often after >> claiming term statistics based approaches wouldn't work on library data, >> of course without proofing that). >> >> Usually the following possible factors are mentioned: >> - popularity (often after stressing Google's success with PageRank), >> measured in several ways like holding quantities, circulation >> statistics, clicks in catalogues, explicit user ratings, number of >> citations, ... >> - freshness: rank newer items higher (ok, we have that in many old >> school Boolean OPACs as "sort by date", but not in combination with >> other ranking factors like term statistics) >> - availability >> - contextual/individual factors, eg. if (user.status=student) >> boost(textbook); if (user.faculty=economics) boost(Karl Marx); if >> season=christmas boost(gingerbread recipes); ... >> - ... >> >> I tried to find examples where such factors beyond term statistics are >> used to rank search results in libraryland. But I hardly find them, only >> lots of theoretical discussions about all the pros and cons of all >> thinkable factors going on since the 1980s. I mean, all that is doable >> with search engines like Solr today. But it seems, it is hardly >> implemented somewhere in real systems (beyond simple cases, for example >> we slightly boost hits in collections a user has immediate online access >> to, but we never asked users, if they like it or notice at all). >> WorldCat does a little bit something, it seems. They, of course, boost >> resources with local holdings in WorldCat local. And they use language >> preferences (Accept-Language HTTP header) for boosting titles in users' >> preferred languages. And there might be more in WorldCat ranking. But >> there is not much published on that, it seems? >> >> So, if you implemented something beyond term statistics based ranking, >> speak up and show. I am very interested in real world implementations >> and experiences (like user feedback, user studies etc.). >> >> Thanks, >> Till >>