A bunch of us are using Solr/lucene for discovery over library
bibliographic records, which is based on the basic tf*idf weighting type
algorithm, with a bunch of tweaks. So all of us doing that, and
finding it pretty successful, are probably surprised to hear that this
approach won't work on library data. :)
Jonathan
On 2/15/2011 4:13 PM, Dave Caroline wrote:
> I wrote my own search engine for my system and thought long and hard
> about relevancy, in the end went for none! and display alphabetical.
>
> Dave Caroline
>
> On Tue, Feb 15, 2011 at 8:32 PM, Till Kinstler<[log in to unmask]> wrote:
>> There is a vivid discussion about relevance ranking for library
>> resources in discovery interfaces in recent years. In articles, blog
>> posts and presentations on this topic, again and again possible ranking
>> factors are discussed beyond well known term statistic based methods
>> like the vector space retrieval model with tf*idf weighting (often after
>> claiming term statistics based approaches wouldn't work on library data,
>> of course without proofing that).
>>
>> Usually the following possible factors are mentioned:
>> - popularity (often after stressing Google's success with PageRank),
>> measured in several ways like holding quantities, circulation
>> statistics, clicks in catalogues, explicit user ratings, number of
>> citations, ...
>> - freshness: rank newer items higher (ok, we have that in many old
>> school Boolean OPACs as "sort by date", but not in combination with
>> other ranking factors like term statistics)
>> - availability
>> - contextual/individual factors, eg. if (user.status=student)
>> boost(textbook); if (user.faculty=economics) boost(Karl Marx); if
>> season=christmas boost(gingerbread recipes); ...
>> - ...
>>
>> I tried to find examples where such factors beyond term statistics are
>> used to rank search results in libraryland. But I hardly find them, only
>> lots of theoretical discussions about all the pros and cons of all
>> thinkable factors going on since the 1980s. I mean, all that is doable
>> with search engines like Solr today. But it seems, it is hardly
>> implemented somewhere in real systems (beyond simple cases, for example
>> we slightly boost hits in collections a user has immediate online access
>> to, but we never asked users, if they like it or notice at all).
>> WorldCat does a little bit something, it seems. They, of course, boost
>> resources with local holdings in WorldCat local. And they use language
>> preferences (Accept-Language HTTP header) for boosting titles in users'
>> preferred languages. And there might be more in WorldCat ranking. But
>> there is not much published on that, it seems?
>>
>> So, if you implemented something beyond term statistics based ranking,
>> speak up and show. I am very interested in real world implementations
>> and experiences (like user feedback, user studies etc.).
>>
>> Thanks,
>> Till
>>
|