The only language that I know of with a library for reading Marc8 and
converting to another encoding (such as UTF-8) is Java. The Marc4J
package will do it.
I suppose there may be C libraries too; is yaz written in C?
As Michael suggests the easiest thing to do (if you're not in Java) is
probably to use the 'yaz' tools to convert to UTF-8 before anything else
touches it.
If you do end up writing a Marc8 handling library in another language
like Perl (presumably you could use the Java code in Marc4J as a guide),
please do share! Heh.
On 10/24/2011 2:34 PM, Doran, Michael D wrote:
> Hi Eric,
>
>> In Perl, how do I specify MARC-8 when reading (decoding) and writing
>> (encoding) data?
> You can't. MARC-8 is a character set that is unknown to the operating system. Your best bet is to convert MARC-8-encoded records into UTF-8.
>
>> ...it is converted it Perl's
>> internal encoding (UTF-8)
> As an FTY, UTF-8 is *not* Perl's internal encoding.
>
> -- Michael
>
> # Michael Doran, Systems Librarian
> # University of Texas at Arlington
> # 817-272-5326 office
> # 817-688-1926 mobile
> # [log in to unmask]
> # http://rocky.uta.edu/doran/
>
>
>
>> -----Original Message-----
>> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Eric
>> Lease Morgan
>> Sent: Monday, October 24, 2011 1:18 PM
>> To: [log in to unmask]
>> Subject: [CODE4LIB] marc-8
>>
>> In Perl, how do I specify MARC-8 when reading (decoding) and writing
>> (encoding) data?
>>
>> Character encoding is the bane of my existence. I have learned that when
>> reading from a file I ought to specify the type of encoding the file is in
>> and decode accordingly, or else. Once read, it is converted it Perl's
>> internal encoding (UTF-8) and can be manipulated. Similarly, when writing I
>> am expected to specify the encoding. Both the reading (decoding) and the
>> writing (encoding) can be done with the Encode module. Here is a some code
>> illustrating what I'm trying to do with MARC records which are apparently in
>> MARC-8:
>>
>> # require
>> use Encode qw( encode decode );
>>
>> # initialize
>> my $batch = MARC::Batch->new( 'USMARC', './records.mrc' );
>> open OUT, '> updated.mrc';
>>
>> # process each record
>> while ( my $marc = $batch->next ) {
>>
>> # get the title
>> my $_245 = decode( 'FOO', $marc->title );
>>
>> # do cool stuff with the title here
>>
>> # output the cool stuff
>> print OUT encode( 'FOO', $_245 );
>>
>> }
>>
>> # done
>> close OUT;
>> exit;
>>
>>
>> My problem is, I don't know what to put in place of FOO. What is the official
>> name of MARC-8's encoding scheme?
>>
>> --
>> Eric "The Ugly American" Morgan
>> University of Notre Dame
>>
>> (574) 631-8604
|