Print

Print


Jonathan,

We are using  public FRBR algorithm developed by OCLC Research

Since we just loaded limited OCA records into xISBN service, it might be
interesting to illustrate what can be done in current system.

If a user is interested in "The Golden Fleece and the Heroes Who Lived
Before Achilles" with ISBN:0689868847, and he can limit the search to "OCA"
by issuing xISBN request with "library=oca", such as:

http://xisbn.worldcat.org/webservices/xid/isbn/0689868847?library=oca&fl=*

This query limits search scope to OCA, and the result returns an ISBN match
with its URL link to: http://www.archive.org/details/goldenfleecehero00colu

Similarly, a user can request same ISBN with the library limiting to
"ebook", such as:
http://xisbn.worldcat.org/webservices/xid/isbn/0689868847?library=ebook&fl=*

It returns both OCA match and a Netlibrary Audio book match.

Given we only have a very limited number of ISBN matches (over 1000 titles)
with OCA, perhaps the result is not good enough for practical use. I believe
the result will be significantly improved once we have xoclcnum in place.

xiaoming


*
*
On Wed, Mar 12, 2008 at 5:36 PM, Jonathan Rochkind <[log in to unmask]> wrote:

> This is great stuff. I am interested in what algorithms you are using to
> group works. It sounds like you are doing that, above what OCA does
> (which is nothing, I think).  Have you gotten that far yet? What are you
> thinking? Oh wait, you're from OCLC, you guys have already got all sorts
> of stuff to do that, I guess.
>
> Jonathan
>
> Tim McCormick wrote:
> > In our office we too have been investigating the e-book material at
> > Internet Archive / OCA.
> >
> > We'd like to build just the sort of OCA index / id-switcher that Tim
> > Shearer and others have described on this list -- in order to, among
> > other things, add this type of capability to our xID (aka xISBN)
> > service, and to WorldCat.
> >
> > So, I thought I'd report on results so far, and what we're working on.
> >
> > Data:
> > 1) First, we used the Internet Archive's OAI interface to harvest
> > brief records of all items categorized as "text".  We found that this
> > yielded only very brief records, though -- author, title, and OCA
> > unique identifier (e.g. "northcarolinayea1910rale").
> > 2) Then we used the OCA identifier to check for, and harvest, MARC-XML
> > records when available, using the lookup method described by Chris
> > Freeland on Code4Lib on Feb 25.
> > 3) The MARC files were examined for ISBNs and OCLCnums.  (yes, we may
> > look for other identifiers later).
> >
> > That yielded:
> >   - 290,756 total OCA "text" records found
> >   - 198,826 of those had MARC records
> >   - 1773 had ISBNs
> >   - 88537 had OCLC numbers (identified by record position & format,
> > but not yet verified against WorldCat).
> >
> > Switching:
> > In xID we currently support ISBN, have recently added LCCN, and we
> > plan to release ISSN and OCLCnum support in upcoming releases.  So,
> > when those are fully phased in, the goal is that you could submit an
> > identifier of any supported type, and get back all identifiers of
> > whichever type that represent versions of the same "work";  or, when
> > appropriate, the same manifestation.
> >     Therefore, the 88.537 OCLCnums will likely map to a much larger
> > set of identifiers over all, allowing a lot of book records -- in
> > library catalogs or elsewhere -- to hook into OCA materials.
> >
> > Free-text service:
> > We imagine a service which, given an identifier, attempts to decide if
> > a free-text version of the described work is available at OCA/IA: and
> > if so, returns an access URL for that resource.
> >
> > Other work:
> > We are investigating the case of free/open resources that lack
> > standard identifiers -- for example, possibly, the 2/3 of IA texts for
> > which we didn't find OCLCnum or ISBN.  Here, we are looking at doing
> > "best-guess" lookup of related identifiers, based on author and title
> > information in the brief record.   This might allow substantially
> > broader indexing of open content materials, but the reliability of the
> > identifier association is lower.
> >
> > Any tips, questions, suggestions, requests are welcome.
> > thanks to Xiaoming Liu and Tom Ventimiglia in OCLC New Jersey office
> > for work on this.
> >
> > Tim
> >
> > --
> > Tim McCormick
> > Product Manager (xID), OCLC New Jersey
> > Email: mccormit (at) oclc.org
> > 2 Broad St., Suite 208, Bloomfield, New Jersey 07003 USA
> > Phone: +1.973.868.5694  |  Skype:  tim_mccormick
> > http://www.oclc.org/
> >
> >
>
> --
> Jonathan Rochkind
> Digital Services Software Engineer
> The Sheridan Libraries
> Johns Hopkins University
> 410.516.8886
> rochkind (at) jhu.edu
>