The only language that I know of with a library for reading Marc8 and converting to another encoding (such as UTF-8) is Java. The Marc4J package will do it. I suppose there may be C libraries too; is yaz written in C? As Michael suggests the easiest thing to do (if you're not in Java) is probably to use the 'yaz' tools to convert to UTF-8 before anything else touches it. If you do end up writing a Marc8 handling library in another language like Perl (presumably you could use the Java code in Marc4J as a guide), please do share! Heh. On 10/24/2011 2:34 PM, Doran, Michael D wrote: > Hi Eric, > >> In Perl, how do I specify MARC-8 when reading (decoding) and writing >> (encoding) data? > You can't. MARC-8 is a character set that is unknown to the operating system. Your best bet is to convert MARC-8-encoded records into UTF-8. > >> ...it is converted it Perl's >> internal encoding (UTF-8) > As an FTY, UTF-8 is *not* Perl's internal encoding. > > -- Michael > > # Michael Doran, Systems Librarian > # University of Texas at Arlington > # 817-272-5326 office > # 817-688-1926 mobile > # [log in to unmask] > # http://rocky.uta.edu/doran/ > > > >> -----Original Message----- >> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Eric >> Lease Morgan >> Sent: Monday, October 24, 2011 1:18 PM >> To: [log in to unmask] >> Subject: [CODE4LIB] marc-8 >> >> In Perl, how do I specify MARC-8 when reading (decoding) and writing >> (encoding) data? >> >> Character encoding is the bane of my existence. I have learned that when >> reading from a file I ought to specify the type of encoding the file is in >> and decode accordingly, or else. Once read, it is converted it Perl's >> internal encoding (UTF-8) and can be manipulated. Similarly, when writing I >> am expected to specify the encoding. Both the reading (decoding) and the >> writing (encoding) can be done with the Encode module. Here is a some code >> illustrating what I'm trying to do with MARC records which are apparently in >> MARC-8: >> >> # require >> use Encode qw( encode decode ); >> >> # initialize >> my $batch = MARC::Batch->new( 'USMARC', './records.mrc' ); >> open OUT, '> updated.mrc'; >> >> # process each record >> while ( my $marc = $batch->next ) { >> >> # get the title >> my $_245 = decode( 'FOO', $marc->title ); >> >> # do cool stuff with the title here >> >> # output the cool stuff >> print OUT encode( 'FOO', $_245 ); >> >> } >> >> # done >> close OUT; >> exit; >> >> >> My problem is, I don't know what to put in place of FOO. What is the official >> name of MARC-8's encoding scheme? >> >> -- >> Eric "The Ugly American" Morgan >> University of Notre Dame >> >> (574) 631-8604