On Fri, May 9, 2008 at 2:23 PM, Joe Hourcle <[log in to unmask]> wrote: > OpenLibrary has other datasets that you might be able to use / combine / > whatever to meet your requirements: > > http://openlibrary.org/dev/docs/data This'll get you the other MARC dumps that have been made available to IA through OL: http://www.archive.org/search.php?query=collection%3Aol_data%20marc Lots to work with here. I also wonder if rather than one large test set it wouldn't be good to have smaller test sets which exhibit particular problems or are of a particular type (i.e. music). Jason