How do I trap for unwanted (bogus) characters in MARC records?
I have a set of Internet Archive identifiers, and have written the followoing Perl loop to get the MARC records associated with each one:
# process each identifier
my $ua = LWP::UserAgent->new( agent => AGENT );
while ( <DATA> ) {
# get the identifier
chop;
my $identifier = $_;
print $identifier, "\n";
# get its corresponding MARC record
my $response = $ua->get( ROOT . "$identifier/$identifier" . "_meta.mrc" );
if ( ! $response->is_success ) {
warn $response->status_line;
next;
}
# save it
open MARC, " > $identifier.mrc" or die "Can't open $identifier.mrc: $!\n";
binmode MARC, ":utf8";
print MARC $response->content;
close MARC;
}
I then use the venerable marcdump to see the fruits of my labors: marcdump *.mrc. Unfortunately, marcdump returns the following error against (at least) one of my files:
bienfaitsducatho00pina.mrc
utf8 "\xC3" does not map to Unicode at /System/Library/
Perl/5.10.0/darwin-thread-multi-2level/Encode.pm line 162.
What is going on here? Am I saving my files incorrectly? Is the original MARC data inherintly incorrect? Is there some way I can fix the MARC record in question?
--
Eric Lease Morgan
|