LISTSERV 16.5 - CODE4LIB Archives

Right. The observation had more to do with how to order the items within
a workset. The visitor was suggesting that a combination of popularity
and currency ought to be considered for determining display. So between
titles, you could show those titles that were more widely held first.
Then within titles, you could show the most recent edition of the title
at the top -- independent of the number of holdings associated with that
particular edition.

-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
Jonathan Rochkind
Sent: Wednesday, April 12, 2006 11:50 AM
To: [log in to unmask]
Subject: Re: [CODE4LIB] Question re: ranking and FRBR

Can you clarify, were all the editions grouped together in a 'work
set', or were the two editions you speak of separated by interfiled
items that were editions of an entirely separate works?   Because
grouping them together as a work set seems the first step, right?
Once you've done this, you're already way ahead of the game of most
current systems. If the question is what order to put the items
_within_ a work set in----reverse chronological seems as good as any
(especially for fiction), but clearly different patrons are going to
have different needs (somebody might be looking for the oldest one,
or for a particular one), and popularity might make as many people
happy as chronology, it's hard to say.

But yeah, measuring popularity against all the editions of a work is
exactly what Thom was advising, and saying WorldCat did.  Seems to me
as long as your system knows which items are related to each other in
a work set, the battle is half done. Most systems now don't, of
course. Even WorldCat has trouble with it, as Thom's Don Quixote
example showed. (Of course, a real solution has to come from changes
to the cataloging data itself, but I'm not holding my breath.)

[I do love talking about this stuff myself, hope everyone's not
getting bored or annoyed at the discussion. If there's some place
else people are having discussions like this I should go instead,
please let me know.]

--Jonathan Rochkind

>Another observation (from another visitor visiting OCLC).
>
>This concerned how to display results for texts with multiple editions.
>We conducted a search in which the first title in the result set was
for
>the first edition of a work published in 1976.
>
>A more recent version of the title was released in 2002, but this
>version was buried further down the result set because there were fewer
>libraries who had actually acquired it compared to those that owned a
>copy of the first version.
>
>The visitor thought it probably made more sense, in this case, to rank
>the versions in reverse chronological order by publication date so that
>the most recent version was at the top of the list, irrespective of the
>number of holdings for an edition.
>
>Perhaps, then, popularity might be measured collectively against all of
>the editions of a title, rather than solely against holdings for
>particular editions of the same title.
>
>Doug Loynes
>Director, Content Inititives
>OCLC, Online Computer Library Center Inc.
>[log in to unmask]
>
>
>-----Original Message-----
>From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
>Hickey,Thom
>Sent: Wednesday, April 12, 2006 10:49 AM
>To: [log in to unmask]
>Subject: Re: [CODE4LIB] Question re: ranking and FRBR
>
>A visitor here yesterday made an observation relevant to this
>discussion.  We were looking at the results of a search for "Don
>Quixote" in a yet-to-be-released version of FictionFinder.  The results
>were ranked by the number of libraries holding each 'work'.  Here's an
>abbreviated version of the results list:
>
>1. Don Quixote  / Cervantes Saavedra, Miguel de
>2. History of the Adventures of Joseph Andrews  / Fielding, Henry
>3. Morgenlandfahrt  / Hesse, Hermann
>4. The Ingenious Gentleman Don Quixote de la Mancha  / Cervantes
>Saavedra, Miguel de
>5. The Adventures of Don Quixote  / Cervantes Saavedra, Miguel de
>6. The First Part of the Delightful History of the Most Ingenious
Knight
>Don Quixote of the Mancha  / Cervantes Saavedra, Miguel de
>
>Because of some title variations, not all the Don Quixote's are brought
>together.  The visitor's point, though, was that #2 by Fielding really
>shouldn't be ranked higher than 4, 5, & 6, which seem more closely
>related to the "Don Quixote" search than Fielding's (even though Joseph
>Andrews is related to Don Quixote).
>
>Of course this might be just the right ordering for someone, but in
>general an ordering that takes into account where the search terms
>occurred in the records, in addition to how popular the works are,
>should work better than one that ignores that information.
>
>--Th
>
>
>-----Original Message-----
>From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
>Keith Jenkins
>Sent: Tuesday, April 11, 2006 2:49 PM
>To: [log in to unmask]
>Subject: Re: [CODE4LIB] Question re: ranking and FRBR
>
>A very interesting discussion here... so I'll support its funding with
>my own two cents.
>
>I'd argue that search relevance is a product of two factors:
>   A. The overall popularity of an item
>   B. The appropriateness to a given query
>
>Both are approximate measures with their own difficulties, but a good
>search usually needs to focus on both (unless B is so restrictive that
>we don't need A).
>
>B is always going to be inhibited, to various degrees, by the limited
>nature of the user's input--usually just a couple of words.  If a user
>isn't very specific, then it is indeed quite difficult to determine
>what would be most relevant to that user.  That's where A can really
>help to sort a large number of results (although B can also help
>sorting).  I think Thom makes a good point here:
>
>On 4/10/06, Hickey,Thom <[log in to unmask]> wrote:
>>  Actually, though, 'relevancy' ranking based on where terms occur in
>the
>>  record and how many times they occur is of minor help compared to
some
>>  sort of popularity score.  WorldCat holdings work fairly well for
>that,
>>  as should circulation data.
>
>In fact, it was this sort of "popularity score" logic that originally
>enabled Google to provide a search engine far better than what was
>possible using just term placement and frequency metrics for each
>document.  Word frequency is probably useless for our short
>bibliographic records that are often cataloged at differing levels of
>completeness.  But I think it could still be useful to give more
>weight to the title and primary author of a book.
>
>The basic mechanism of Google's PageRank algorithm is this: a link
>from page X to page Y is a vote by X for Y, and the number of votes
>for Y determines the power of Y's vote for other pages.  We could
>apply this to FRBR records, if we think of every FRBR relationship as
>a two-way link.  In this way, all the items link to the
>manifestations, which link to the expressions, which link to the
>works.  All manner of derivative works would also be linked to the
>original works.  So the most highly-related works get ranked the
>highest.  (For the algorithmically-minded, I found the article "XRANK:
>Ranked Keyword Search over XML Documents" helpful in understanding how
>the PageRank algorithm can be applied to other situations:
>http://www.cs.cornell.edu/~cbotev/XRank.pdf )  It would be interesting
>to see how such an approach compares to a simple tally of "number of
>versions".
>
>-Keith