Are you expecting to work with two files of records, outside of your ILS?
If so, for a project like that I'd probably write Perl script(s) using
MARC::Record (there are similar code libraries for Ruby, Python and Java at
For each record in each file, use the ISBN (and/or OCLC number and/or LCCN)
as a key. Compare all sets, and keep one record per key.
This assumes that the vendors are supplying records with standard
identifiers, and not just their own record numbers.
If you're comparing each file with what's already in your ILS, then it'll
depend on the tools the ILS offers for matching incoming records to the
database. Or, export the database and compare it with the files, as above.
Andy Kohler / UCLA Library Info Tech
[log in to unmask] / 310 206-8312
On Thu, Aug 15, 2013 at 10:11 AM, Michael Beccaria <[log in to unmask]
> Has anyone had any luck finding a good way to de-duplicate MARC records
> from ebook vendors. We're looking to integrate Ebrary and Ebsco Academic
> Ebook collections and they estimate an overlap into the 10's of thousands.