On Apr 11, 2006, at 4:11 PM, Colleen Whitney wrote: > Jonathan Rochkind wrote: > >>> not the right approach. And yet...I wish I could explain why it >>> seems as >>> though the clustering can tell us something. >> >> >> Well, what is it you think the clustering can tell you something >> _about_? This is an interesting topic to me. >> >> I'm not sure the clustering can tell you anything about relevance to >> the user. I'm not seeing it. I mean, the number of items that are >> members of a FRBR work set really just indicates how many 'versions' >> (to be imprecise) of that work exist. But the number of 'versions' of >> a work that exist doesn't really predict how likely that work (or any >> of it's versions) is to be of interest to a user, does it? But maybe >> you're thinking of something I'm missing, I'm curious what you're >> thinking about. > > Yes, that's exactly what I'm stuck on. If "more important" or "more > popular" works tend to have more manifestations, then there might be > some signal as to probability of relevance in there. Which could be > factored in (in some *small* way). But I'm not sure whether/how one > would test that "if". At the moment you have me convinced that it's a > red herring. Perhaps there is something useful about grouping and highlighting works that have a large number of manifestations. My gut tells me that this would be more useful for a general audience than for specialized researchers. But you don't necessarily have to factor this into your default search relevance algorithm to expose it. Just speculating, but could one use the term "classics" to describe works with an exceeding large number of manifestations? Maybe this could be a useful post-search sort option. Or maybe you can define a high-manifestation threshold for your collection... if the user's search term matches any of these items, they are highlighted on the search results page in a separate bucket. Perhaps some people would appreciate such a filtering service. This may also apply for other specialized search needs. Rather than complicate (dilute?) your relevance algorithm by adding in factors of relevance only to a particular audience, why not develop targeted discovery services that complement the search results? Tito Sierra