On Apr 11, 2006, at 4:11 PM, Colleen Whitney wrote:
> Jonathan Rochkind wrote:
>>> not the right approach. And yet...I wish I could explain why it
>>> seems as
>>> though the clustering can tell us something.
>> Well, what is it you think the clustering can tell you something
>> _about_? This is an interesting topic to me.
>> I'm not sure the clustering can tell you anything about relevance to
>> the user. I'm not seeing it. I mean, the number of items that are
>> members of a FRBR work set really just indicates how many 'versions'
>> (to be imprecise) of that work exist. But the number of 'versions' of
>> a work that exist doesn't really predict how likely that work (or any
>> of it's versions) is to be of interest to a user, does it? But maybe
>> you're thinking of something I'm missing, I'm curious what you're
>> thinking about.
> Yes, that's exactly what I'm stuck on. If "more important" or "more
> popular" works tend to have more manifestations, then there might be
> some signal as to probability of relevance in there. Which could be
> factored in (in some *small* way). But I'm not sure whether/how one
> would test that "if". At the moment you have me convinced that it's a
> red herring.
Perhaps there is something useful about grouping and highlighting
works that have a large number of manifestations. My gut tells me
that this would be more useful for a general audience than for
specialized researchers. But you don't necessarily have to factor
this into your default search relevance algorithm to expose it.
Just speculating, but could one use the term "classics" to describe
works with an exceeding large number of manifestations? Maybe this
could be a useful post-search sort option. Or maybe you can define a
high-manifestation threshold for your collection... if the user's
search term matches any of these items, they are highlighted on the
search results page in a separate bucket. Perhaps some people would
appreciate such a filtering service.
This may also apply for other specialized search needs. Rather than
complicate (dilute?) your relevance algorithm by adding in factors of
relevance only to a particular audience, why not develop targeted
discovery services that complement the search results?