http://staff.oclc.org/~levan/PearsTraining/scifi.usmarc has 10,000 marc
records in it. They are part of the old SiteSearch system that OCLC
released as open source. They date back to 2002 and will not contain
any Unicode, if you were hoping to include that as part of your testing.
Ralph
-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
Alexander Johannesen
Sent: Wednesday, January 11, 2012 5:36 AM
To: [log in to unmask]
Subject: Open datasets
Hiya,
I'm in the middle of creating a meta data management system (including
merging and persistent identifier management) for a somewhat different
domain (intranets and business integration), but it's based on Topic
Maps
and so is well suited to other means of meta data handling / mangling.
It's
also going to be open-source, and it might be well-suited to library
tasks
as well.
So in order to test the integrity and performance of my system so far
I'm
wondering if there's a suitable open dataset of bibliographic records
that
aren't too obscure (meaning, I can find the titles at amazon or Open
Library) that you could recommend? More than 1000 records, but less than
a
million, maybe?
Regards,
Alex
|