Print

Print


On Wed, Dec 16, 2009 at 11:24 AM, Walker, David <[log in to unmask]> wrote:

> If you're looking to convert that data to UTF-8 (which I assume you would), then your best friend is a program from Index Data called yaz-marcdump, which comes with the Yaz toolkit.  It runs on Linux and Windows, and can be invoked from the command line or from scripts to quickly and painlessly convert your catalog data into UTF-8.

Do keep in mind that if you've got a *mix* of character encodings in
your database, you may have a Big Annoying Problem. Unless you know
what records are in what format, there's no general way to do a
conversion.

You can use the sweet sweet python 'chardet' library to get a good
idea of what encoding things are in, and maybe run things through
iconv to normalize them to UTF8.

Cheers,
-Nate