Michael -  I'm just about to load ebook records into our Innovative catalog, and I'm going to keep the e-books separate from the print book records.  For ebooks, I'm going to copy the OCLC number to the 901 with a prestamp, and overlay on that. So only records loaded with our ebook load table will have this 901 to overlay on.  Then I'm going to protect the 856s and the 710s for the ebook collection statement.  That'll take care of adds.  For deletes... I haven't got that worked out yet.  I think there's a way to delete a field based on the incoming field.

Cindy Harper
Virginia Theological Seminary
[log in to unmask]

-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Andy Kohler
Sent: Thursday, August 15, 2013 2:29 PM
To: [log in to unmask]
Subject: Re: [CODE4LIB] De-dup MARC Ebook records

Are you expecting to work with two files of records, outside of your ILS?
If so, for a project like that I'd probably write Perl script(s) using MARC::Record (there are similar code libraries for Ruby, Python and Java at least).

For each record in each file, use the ISBN (and/or OCLC number and/or LCCN) as a key.  Compare all sets, and keep one record per key.

This assumes that the vendors are supplying records with standard identifiers, and not just their own record numbers.

If you're comparing each file with what's already in your ILS, then it'll depend on the tools the ILS offers for matching incoming records to the database.  Or, export the database and compare it with the files, as above.

Andy Kohler / UCLA Library Info Tech
[log in to unmask] / 310 206-8312

On Thu, Aug 15, 2013 at 10:11 AM, Michael Beccaria <[log in to unmask]
> wrote:

> Has anyone had any luck finding a good way to de-duplicate MARC 
> records from ebook vendors. We're looking to integrate Ebrary and 
> Ebsco Academic Ebook collections and they estimate an overlap into the 10's of thousands.