Print

Print


How do I trap for unwanted (bogus) characters in MARC records?

I have a set of Internet Archive identifiers, and have written the followoing Perl loop to get the MARC records associated with each one:

  # process each identifier
  my $ua = LWP::UserAgent->new( agent => AGENT );
  while ( <DATA> ) {
    
    # get the identifier
    chop;
    my $identifier = $_;
    print $identifier, "\n";
    
    # get its corresponding MARC record
    my $response = $ua->get( ROOT . "$identifier/$identifier" . "_meta.mrc" );
    if ( ! $response->is_success ) {
    
      warn $response->status_line;
      next;
      
    }
  
    # save it
    open MARC, " > $identifier.mrc" or die "Can't open $identifier.mrc: $!\n";
    binmode MARC, ":utf8";
    print MARC $response->content;
    close MARC;
  
  }

I then use the venerable marcdump to see the fruits of my labors: marcdump *.mrc. Unfortunately, marcdump returns the following error against (at least) one of my files:

  bienfaitsducatho00pina.mrc
  utf8 "\xC3" does not map to Unicode at /System/Library/
  Perl/5.10.0/darwin-thread-multi-2level/Encode.pm line 162.

What is going on here? Am I saving my files incorrectly? Is the original MARC data inherintly incorrect? Is there some way I can fix the MARC record in question?

-- 
Eric Lease Morgan