Eric, is this your source file? http://ia341306.us.archive.org/1/items/bienfaitsducatho00pina/bienfaitsducatho00pina_meta.mrc I have nothing really much to offer with regard to MARC.pm and its ilk, but I thought it might help people track down your problem. FWIW, yaz-marcdump spits out this on that record: $ yaz-marcdump bienfait.mrc 00795cam a2200277 a 4500 001 1556719 003 CaOTULAS 005 19931129144435.0 008 780210s1842 fr fre d 035 $a (Sirsi) AZF-9578 040 $a NUC $c NUC $d otsm 049 $a otstm $b eng 050 04 $a BX946 $b .P5 055 3 $a BX946 $b .P55 1842 090 8 $a BX 946 .P55 1842 $b SMRS 100 10 $a Pinard, Clovis, $d d.1865. 245 10 $a Bienfaits du Catholicisme dans la société / $c par l'abbé P (No separator at end of field length=71) 260 na $d . (Separator but not at end of field length=26) 300 18 $2 . (Separator but not at end of field length=11) 490 00 $p . (Separator but not at end of field length=45) (Bad indicator data. Skipping 2 bytes) 596 ?t $e nn (No separator at end of field length=7) 610 ne (Separator but not at end of field length=30) 948 xH $s tory. (Separator but not at end of field length=27) 039 0/ $6 /199 (No separator at end of field length=9) (Bad indicator data. Skipping 1 bytes) 093 0 $f mcsk (Separator but not at end of field length=21) 926 12 $1 44434 (Separator but not at end of field length=48) The diacritics definitely look pretty sketchy there. In fact, I just tried this with every encoding in yaz-marcdump, and the diacritics never properly converted to UTF-8. They seem ok here: http://ia341306.us.archive.org/1/items/bienfaitsducatho00pina/bienfaitsducatho00pina_marc.xml though, so you might want to grab both binary marc and marcxml and fall back to the latter in case of encoding errors. -Ross. On Thu, Oct 7, 2010 at 6:51 AM, Eric Lease Morgan <[log in to unmask]> wrote: > How do I trap for unwanted (bogus) characters in MARC records? > > I have a set of Internet Archive identifiers, and have written the followoing Perl loop to get the MARC records associated with each one: > > # process each identifier > my $ua = LWP::UserAgent->new( agent => AGENT ); > while ( <DATA> ) { > > # get the identifier > chop; > my $identifier = $_; > print $identifier, "\n"; > > # get its corresponding MARC record > my $response = $ua->get( ROOT . "$identifier/$identifier" . "_meta.mrc" ); > if ( ! $response->is_success ) { > > warn $response->status_line; > next; > > } > > # save it > open MARC, " > $identifier.mrc" or die "Can't open $identifier.mrc: $!\n"; > binmode MARC, ":utf8"; > print MARC $response->content; > close MARC; > > } > > I then use the venerable marcdump to see the fruits of my labors: marcdump *.mrc. Unfortunately, marcdump returns the following error against (at least) one of my files: > > bienfaitsducatho00pina.mrc > utf8 "\xC3" does not map to Unicode at /System/Library/ > Perl/5.10.0/darwin-thread-multi-2level/Encode.pm line 162. > > What is going on here? Am I saving my files incorrectly? Is the original MARC data inherintly incorrect? Is there some way I can fix the MARC record in question? > > -- > Eric Lease Morgan >