Print

Print


The only language that I know of with a library for reading Marc8 and 
converting to another encoding (such as UTF-8) is Java. The Marc4J 
package will do it.

I suppose there may be C libraries too; is yaz written in C?

As Michael suggests the easiest thing to do (if you're not in Java) is 
probably to use the 'yaz' tools to convert to UTF-8 before anything else 
touches it.

If you do end up writing a Marc8 handling library in another language 
like Perl (presumably you could use the Java code in Marc4J as a guide), 
please do share! Heh.

On 10/24/2011 2:34 PM, Doran, Michael D wrote:
> Hi Eric,
>
>> In Perl, how do I specify MARC-8 when reading (decoding) and writing
>> (encoding) data?
> You can't.  MARC-8 is a character set that is unknown to the operating system.  Your best bet is to convert MARC-8-encoded records into UTF-8.
>
>> ...it is converted it Perl's
>> internal encoding (UTF-8)
> As an FTY, UTF-8 is *not* Perl's internal encoding.
>
> -- Michael
>
> # Michael Doran, Systems Librarian
> # University of Texas at Arlington
> # 817-272-5326 office
> # 817-688-1926 mobile
> # [log in to unmask]
> # http://rocky.uta.edu/doran/
>
>
>
>> -----Original Message-----
>> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Eric
>> Lease Morgan
>> Sent: Monday, October 24, 2011 1:18 PM
>> To: [log in to unmask]
>> Subject: [CODE4LIB] marc-8
>>
>> In Perl, how do I specify MARC-8 when reading (decoding) and writing
>> (encoding) data?
>>
>> Character encoding is the bane of my existence. I have learned that when
>> reading from a file I ought to specify the type of encoding the file is in
>> and decode accordingly, or else. Once read, it is converted it Perl's
>> internal encoding (UTF-8) and can be manipulated. Similarly, when writing I
>> am expected to specify the encoding. Both the reading (decoding) and the
>> writing (encoding) can be done with the Encode module. Here is a some code
>> illustrating what I'm trying to do with MARC records which are apparently in
>> MARC-8:
>>
>>    # require
>>    use Encode qw( encode decode );
>>
>>    # initialize
>>    my $batch = MARC::Batch->new( 'USMARC', './records.mrc' );
>>    open OUT, '>  updated.mrc';
>>
>>    # process each record
>>    while ( my $marc = $batch->next ) {
>>
>>      # get the title
>>      my $_245 = decode( 'FOO', $marc->title );
>>
>>      # do cool stuff with the title here
>>
>>      # output the cool stuff
>>      print OUT encode( 'FOO', $_245 );
>>
>>    }
>>
>>    # done
>>    close OUT;
>>    exit;
>>
>>
>> My problem is, I don't know what to put in place of FOO. What is the official
>> name of MARC-8's encoding scheme?
>>
>> --
>> Eric "The Ugly American" Morgan
>> University of Notre Dame
>>
>> (574) 631-8604