Yes, open library implemented it, and, of course, where it doesn't work
is where the data is pretty bad. If you prefer to err on the side of
merging, you can loosen the algorithm's weights. It's based on the
algorithm used for the U Cal union catalog, which was exercised over
about 20 years. Last I was able to ascertain, the data elements are very
similar to the ones used (at least at the time) by WorldCat.
kc
On 8/22/13 1:21 PM, Michael Beccaria wrote:
> Karen,
> Do you have a sense of how well it actually works? Is Open Library implementing it?
>
> Mike Beccaria
> Systems Librarian
> Head of Digital Initiative
> Paul Smith's College
> 518.327.6376
> [log in to unmask]
> Become a friend of Paul Smith's Library on Facebook today!
>
>
> -----Original Message-----
> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Karen Coyle
> Sent: Thursday, August 22, 2013 11:53 AM
> To: [log in to unmask]
> Subject: Re: [CODE4LIB] De-dup MARC Ebook records
>
> The record matching algorithm used by the Open Library is available here:
> https://github.com/openlibrary/openlibrary/tree/master/openlibrary/catalog/merge
>
> The original spec, which may have changed in the implementation, is here:
>
> http://kcoyle.net/merge.html
>
> kc
>
>
> On 8/22/13 8:07 AM, Michael Beccaria wrote:
>> Steve,
>> I don't think it's so much find a control field (however, the closest match I can use is ISBN or eISBN which has its issues) but also normalizing the data in the fields so that matches are produced. It will no doubt take some time to figure out.
>>
>> Mike Beccaria
>> Systems Librarian
>> Head of Digital Initiative
>> Paul Smith's College
>> 518.327.6376
>> [log in to unmask]
>> Become a friend of Paul Smith's Library on Facebook today!
>>
>>
>> -----Original Message-----
>> From: Code for Libraries [mailto:[log in to unmask]] On Behalf
>> Of McDonald, Stephen
>> Sent: Friday, August 16, 2013 8:16 AM
>> To: [log in to unmask]
>> Subject: Re: [CODE4LIB] De-dup MARC Ebook records
>>
>> Michael Beccaria said:
>>> Thanks for the replies. To clarify, I am working with 2 (or more in
>>> the future) marc records outside of the ILS. I've tried using
>>> Marcedit but my usage did vary...not much overlap with the control
>>> fields that were available to me. I have a feeling they are a bit
>>> varied. I'm also messing around with marcXimiL a little but I'm
>>> having trouble getting it to output any records at all. I also was
>>> looking at the XC aggregation module but I was having trouble getting
>>> that to work properly as well and the listserv was unresponsive. It
>>> seemed like good software but it required me to set up an OAI harvest
>>> source to allow it to ingest the records and that...well...enough is
>>> enough... I think I will probably need to write something, and at
>>> least that way I know what it will be doing rather than plowing
>>> through software that has little to no support. Please feel free to let me know of a particular strategy you think might work best in this regard...
>> If you couldn't get adequate deduping from the control fields available in MarcEdit deduping, what control fields do you think you need to dedup on? You can actually specify any arbitrary field and subfield for deduping in MarcEdit.
>>
>> Steve McDonald
>> [log in to unmask]
> --
> Karen Coyle
> [log in to unmask] http://kcoyle.net
> ph: 1-510-540-7596
> m: 1-510-435-8234
> skype: kcoylenet
--
Karen Coyle
[log in to unmask] http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
|