Great input....thanks for the discussion. A few notes on what we're seeing on our end: a) We've just finished testing a content-based relevance ranking method with users; it works well for academic users, and will probably work better with some tweaking of field weights. I'll have more specific results in a month or two. b) We experimented with "boosting" some documents using holdings data. OCLC gave us both WorldCat-wide and UC-wide numbers to work with. David's observation is right on...the mix in WorldCat skews things for academic libraries, so we rejected that and used UC system-wide numbers in our experiments. And I do agree with Jonathan's excellent summary...summing the scores is not the right approach. And yet...I wish I could explain why it seems as though the clustering can tell us something. --Colleen David Walker wrote: >The only tricky thing about this with WorldCat, though, is that you have >such a large mix of libraries. > >In my own searching on WorldCat, I've noticed that a fair amount of >fiction and non-scholarly works appear near the top of results because >the public libraries are skewing the holdings of those titles. > >Not a bad thing in itself, if that's what I'm looking for, but our >students are looking for scholarly works (and still learning to >distinguish scholarly from not), so would be nice in our particular >context to limit only to academic libraries that own the title. > >--Dave > >========================= >David Walker >Web Development Librarian >Library, Cal State San Marcos >760-750-4379 >http://public.csusm.edu/dwalker >========================= > > > > > >-----Original Message----- >From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of >Hickey,Thom >Sent: Monday, April 10, 2006 12:52 PM >To: [log in to unmask] >Subject: Re: [CODE4LIB] Question re: ranking and FRBR > >I'd agree with this. > >Actually, though, 'relevancy' ranking based on where terms occur in the >record and how many times they occur is of minor help compared to some >sort of popularity score. WorldCat holdings work fairly well for that, >as should circulation data. The primary example of this sort of ranking >is the web search engines where ranking is based primarily on word >proximity and links. > >--Th > > >-----Original Message----- >From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of >Jonathan Rochkind >Sent: Monday, April 10, 2006 3:16 PM >To: [log in to unmask] >Subject: Re: [CODE4LIB] Question re: ranking and FRBR > >When you are ranking on number of holdings like OCLC is, a straight >sum makes sense to me---the sum of all libraries holding copies of >any manifestation of the FRBR work is indeed the sum of the holdings >for all the records in the FRBR work set. Of course. > >If you're doing relavancy rankings instead though, a straight sum >makes less sense. A relevancy ranking isn't really amenable to being >summed. The sum of the relevancy rankings for various >manifestations/expressions is not probably not a valid indicator of >how relevant the work is to the user, right? And if you did it this >way, it would tend to make the most _voluminous_ work always come out >first as the most 'relevant', which isn't quite right.---This isn't >quite the same problem as OCLC's having the bible come out on >top---since OCLC is ranking by holdings, it's exactly right to have >the bible come out on top, the Bible is indeed surely one of the >(#1?) most held works, so it's quite right for it to be on top. But >the bible isn't always going to be the most relevant result for a >user, just because it's the most voluminous! Summing is going to >mess up your relevancy rankings. > >Just using the maximum relevancy ranking from the work set seems >acceptable to me--the work's relevancy to the user is indicated by >the most relevant manifestation in the set. There might be a better >way to do it (Is a work with four manifestations with a relevancy >ranking .7 more relevant than a work with just one manifestation with >a ranking of .9? I don't think it probably is, actually; I think >just taking the maximum should work fine. But it depends on the >relevancy algorithm maybe.). I don't think I'm enough of a >mathematician to understand the point of the log of the sum, though, >hmm. > >--Jonathan > >At 2:38 PM -0400 4/10/06, Hickey,Thom wrote: > > >>We're doing straight sums of the holdings of all the manifestations in >>the work. It's hard for me to see the need to discount holdings in >>multiple manifestations. It does mean that 'bible' tends to come to >> >> >the > > >>top for many searches, but that's about the only work-set I see coming >>up unexpectedly to the top. >> >>If we had circulation data we'd certainly factor that in (or maybe just >>use it if it was comprehensive enough). >> >>--Th >> >> >>-----Original Message----- >>From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of >>Colleen Whitney >>Sent: Monday, April 10, 2006 2:04 PM >>To: [log in to unmask] >>Subject: Re: [CODE4LIB] Question re: ranking and FRBR >> >>Thanks...is it just a straight sum, Thom? >> >>--C >> >>Hickey,Thom wrote: >> >> >> >>>Here at OCLC we're ranking based on the holdings of all the records in >>>the retrieved work set. Seems to work pretty well. >>> >>>--Th >>> >>>-----Original Message----- >>>From: Code for Libraries [mailto:[log in to unmask]] On Behalf >>> >>> >Of > > >>>Colleen Whitney >>>Sent: Monday, April 10, 2006 1:06 PM >>>To: [log in to unmask] >>>Subject: [CODE4LIB] Question re: ranking and FRBR >>> >>>Hello all, >>> >>>Here's a question for anyone who has been thinking about or working >>> >>> >>with >> >> >>>FRBR for creating record groupings for display. (Perhaps others have >>>already discussed or addressed this...in which case I'd be happy to >>> >>> >>have >> >> >>>a pointer to resources that are already out there.) >>> >>>In a retrieval environment that presents ranked results (ranked by >>>record content, optionally boosted by circulation and/or holdings), >>> >>> >how > > >>>could/should FRBR-like record groupings be factored into ranking? >>>Several approaches have been discussed here: >>> - Rank the results using the score from the highest-scoring record >>> >>> >in > > >>a >> >> >>>group >>> - Use the sum of scores of documents in a group (this seems to me to >>>place too much weight on the group) >>> - Use the log of the sum of the scores of documents in a group >>> >>>I'd be very interested in knowing whether others have already been >>>thinking about this.... >>> >>>Regards, >>> >>>--Colleen Whitney >>> >>> >>> >>>