Thanks for the replies. To clarify, I am working with 2 (or more in the future) marc records outside of the ILS. I've tried using Marcedit but my usage did vary...not much overlap with the control fields that were available to me. I have a feeling they are a bit varied. I'm also messing around with marcXimiL a little but I'm having trouble getting it to output any records at all. I also was looking at the XC aggregation module but I was having trouble getting that to work properly as well and the listserv was unresponsive. It seemed like good software but it required me to set up an OAI harvest source to allow it to ingest the records and that...well...enough is enough... I think I will probably need to write something, and at least that way I know what it will be doing rather than plowing through software that has little to no support. Please feel free to let me know of a particular strategy you think might work best in this regard...

Mike Beccaria
Systems Librarian
Head of Digital Initiative
Paul Smith's College
[log in to unmask]
Become a friend of Paul Smith's Library on Facebook today!

-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Andy Kohler
Sent: Thursday, August 15, 2013 2:29 PM
To: [log in to unmask]
Subject: Re: [CODE4LIB] De-dup MARC Ebook records

Are you expecting to work with two files of records, outside of your ILS?
If so, for a project like that I'd probably write Perl script(s) using MARC::Record (there are similar code libraries for Ruby, Python and Java at least).

For each record in each file, use the ISBN (and/or OCLC number and/or LCCN) as a key.  Compare all sets, and keep one record per key.

This assumes that the vendors are supplying records with standard identifiers, and not just their own record numbers.

If you're comparing each file with what's already in your ILS, then it'll depend on the tools the ILS offers for matching incoming records to the database.  Or, export the database and compare it with the files, as above.

Andy Kohler / UCLA Library Info Tech
[log in to unmask] / 310 206-8312

On Thu, Aug 15, 2013 at 10:11 AM, Michael Beccaria <[log in to unmask]
> wrote:

> Has anyone had any luck finding a good way to de-duplicate MARC 
> records from ebook vendors. We're looking to integrate Ebrary and 
> Ebsco Academic Ebook collections and they estimate an overlap into the 10's of thousands.