Hi Eric,
> In Perl, how do I specify MARC-8 when reading (decoding) and writing
> (encoding) data?
You can't. MARC-8 is a character set that is unknown to the operating system. Your best bet is to convert MARC-8-encoded records into UTF-8.
> ...it is converted it Perl's
> internal encoding (UTF-8)
As an FTY, UTF-8 is *not* Perl's internal encoding.
-- Michael
# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-688-1926 mobile
# [log in to unmask]
# http://rocky.uta.edu/doran/
> -----Original Message-----
> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Eric
> Lease Morgan
> Sent: Monday, October 24, 2011 1:18 PM
> To: [log in to unmask]
> Subject: [CODE4LIB] marc-8
>
> In Perl, how do I specify MARC-8 when reading (decoding) and writing
> (encoding) data?
>
> Character encoding is the bane of my existence. I have learned that when
> reading from a file I ought to specify the type of encoding the file is in
> and decode accordingly, or else. Once read, it is converted it Perl's
> internal encoding (UTF-8) and can be manipulated. Similarly, when writing I
> am expected to specify the encoding. Both the reading (decoding) and the
> writing (encoding) can be done with the Encode module. Here is a some code
> illustrating what I'm trying to do with MARC records which are apparently in
> MARC-8:
>
> # require
> use Encode qw( encode decode );
>
> # initialize
> my $batch = MARC::Batch->new( 'USMARC', './records.mrc' );
> open OUT, ' > updated.mrc';
>
> # process each record
> while ( my $marc = $batch->next ) {
>
> # get the title
> my $_245 = decode( 'FOO', $marc->title );
>
> # do cool stuff with the title here
>
> # output the cool stuff
> print OUT encode( 'FOO', $_245 );
>
> }
>
> # done
> close OUT;
> exit;
>
>
> My problem is, I don't know what to put in place of FOO. What is the official
> name of MARC-8's encoding scheme?
>
> --
> Eric "The Ugly American" Morgan
> University of Notre Dame
>
> (574) 631-8604
|