If you are looking for MARC records, here is a very large file of 
Library of Congress MARC records on the Internet Archive:

It is broken into chunks. The early chunks will not have interesting 
characters, so you might start with a later file.


On 12/26/12 4:57 PM, Marc Chantreux wrote:
> hello perl mongers and librarians,
> I just released an "almost working" iso5426 ucm and Encode::ISO5426 on
> github. This is an XS module way faster than the Koha C4::Charset and
> the encode (to iso5426) feature works.
> I know there are missing chars in the table so my goal is now to run
> MARC::MIR on the largest set of records i can to find them. Also, if
> someone have a (even incomplete) test suite or table: i would be really
> please to read.
> any other feedbacks are very welcome.
> regards,

Karen Coyle
[log in to unmask]
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet