Print

Print


There is a vivid discussion about relevance ranking for library
resources in discovery interfaces in recent years. In articles, blog
posts and presentations on this topic, again and again possible ranking
factors are discussed beyond well known term statistic based methods
like the vector space retrieval model with tf*idf weighting (often after
claiming term statistics based approaches wouldn't work on library data,
of course without proofing that).

Usually the following possible factors are mentioned:
- popularity (often after stressing Google's success with PageRank),
measured in several ways like holding quantities, circulation
statistics, clicks in catalogues, explicit user ratings, number of
citations, ...
- freshness: rank newer items higher (ok, we have that in many old
school Boolean OPACs as "sort by date", but not in combination with
other ranking factors like term statistics)
- availability
- contextual/individual factors, eg. if (user.status=student)
boost(textbook); if (user.faculty=economics) boost(Karl Marx); if
season=christmas boost(gingerbread recipes); ...
- ...

I tried to find examples where such factors beyond term statistics are
used to rank search results in libraryland. But I hardly find them, only
lots of theoretical discussions about all the pros and cons of all
thinkable factors going on since the 1980s. I mean, all that is doable
with search engines like Solr today. But it seems, it is hardly
implemented somewhere in real systems (beyond simple cases, for example
we slightly boost hits in collections a user has immediate online access
to, but we never asked users, if they like it or notice at all).
WorldCat does a little bit something, it seems. They, of course, boost
resources with local holdings in WorldCat local. And they use language
preferences (Accept-Language HTTP header) for boosting titles in users'
preferred languages. And there might be more in WorldCat ranking. But
there is not much published on that, it seems?

So, if you implemented something beyond term statistics based ranking,
speak up and show. I am very interested in real world implementations
and experiences (like user feedback, user studies etc.).

Thanks,
Till