Those of us involved in the Blacklight and VuFind projects are spending lots of time recently thinking about marc records indexing. We're about to start running some performance tests, and we want to create unit tests for our marc to solr indexer, and also people wanting to download and play with the software need to have easy access to a small but representative set of marc records that they can play with. According to the combined brainstorming of Jonathan Rochkind and myself, the ideal record set should: 1. contain about 10k records, enough to really see the features, but small enough that you could index it in a few minutes on a typical desktop 2. contain a distribution of kinds of records, e.g., books, CDs, musical scores, DVDs, special collection items, etc. 3. contain a distribution of languages, so we can test unicode handling 4. contain holdings information in addition to bib records 5. contain a distribution of typical errors one might encounter with marc records in the wild It seems to me that the set that Casey donated to Open Library (http://www.archive.org/details/marc_records_scriblio_net) would be a good place from which to draw records, because although IANAL, this seems to sidestep any legal hurdles. I'd also love to see the ability for the community to contribute test cases. Assuming such a set doesn't exist already (see my question below) this seems like the ideal sort of project for code4lib to host, too. Since code4lib is my lazyweb, I'm asking you: 1. Does something like this exist already and I just don't know about it? 2. If not, do you have suggestions on how to go about making such a data set? I have some ideas on how to do it bit by bit, and we have a certain small set of records that we're already using for testing, but maybe there's a better method that I don't know about? 3. Are there features missing from the above list that would make this more useful? Thoughts? Comments? Thanks! Bess Elizabeth (Bess) Sadler Research and Development Librarian Digital Scholarship Services Box 400129 Alderman Library University of Virginia Charlottesville, VA 22904 [log in to unmask] (434) 243-2305