The marcxml version of the record looks fine:
http://ia341306.us.archive.org/1/items/bienfaitsducatho00pina/bienfaitsducatho00pina_marc.xml
Perhaps there was a problem when the record was converted from marcxml to marc at some point.
-Tim
Tim Prettyman
University of Michigan/LIT
On 10/7/10 9:58 AM, "Cowles, Esme" <[log in to unmask]> wrote:
This record has the classic signs of Unicode treated as Latin-1 by mistake. The multibyte characters often show up as à followed by some other random character. This actually happened to my conference badge in Asheville, which read "Esmé Cowles".
-Esmé
--
Esme Cowles <[log in to unmask]>
"Necessity is the plea for every infringement of human freedom. It is the
argument of tyrants; it is the creed of slaves." -- William Pitt, 1783
On Oct 7, 2010, at 9:39 AM, Ross Singer wrote:
> Eric, is this your source file?
>
> http://ia341306.us.archive.org/1/items/bienfaitsducatho00pina/bienfaitsducatho00pina_meta.mrc
>
> I have nothing really much to offer with regard to MARC.pm and its
> ilk, but I thought it might help people track down your problem.
>
> FWIW, yaz-marcdump spits out this on that record:
>
> $ yaz-marcdump bienfait.mrc
> 00795cam a2200277 a 4500
> 001 1556719
> 003 CaOTULAS
> 005 19931129144435.0
> 008 780210s1842 fr fre d
> 035 $a (Sirsi) AZF-9578
> 040 $a NUC $c NUC $d otsm
> 049 $a otstm $b eng
> 050 04 $a BX946 $b .P5
> 055 3 $a BX946 $b .P55 1842
> 090 8 $a BX 946 .P55 1842 $b SMRS
> 100 10 $a Pinard, Clovis, $d d.1865.
> 245 10 $a Bienfaits du Catholicisme dans la société / $c par l'abbé P
> (No separator at end of field length=71)
> 260 na $d .
> (Separator but not at end of field length=26)
> 300 18 $2 .
> (Separator but not at end of field length=11)
> 490 00 $p .
> (Separator but not at end of field length=45)
> (Bad indicator data. Skipping 2 bytes)
> 596 ?t $e nn
> (No separator at end of field length=7)
> 610 ne
> (Separator but not at end of field length=30)
> 948 xH $s tory.
> (Separator but not at end of field length=27)
> 039 0/ $6 /199
> (No separator at end of field length=9)
> (Bad indicator data. Skipping 1 bytes)
> 093 0 $f mcsk
> (Separator but not at end of field length=21)
> 926 12 $1 44434
> (Separator but not at end of field length=48)
>
> The diacritics definitely look pretty sketchy there.
>
> In fact, I just tried this with every encoding in yaz-marcdump, and
> the diacritics never properly converted to UTF-8.
>
> They seem ok here:
>
> http://ia341306.us.archive.org/1/items/bienfaitsducatho00pina/bienfaitsducatho00pina_marc.xml
>
> though, so you might want to grab both binary marc and marcxml and
> fall back to the latter in case of encoding errors.
>
> -Ross.
>
> On Thu, Oct 7, 2010 at 6:51 AM, Eric Lease Morgan <[log in to unmask]> wrote:
>> How do I trap for unwanted (bogus) characters in MARC records?
>>
>> I have a set of Internet Archive identifiers, and have written the followoing Perl loop to get the MARC records associated with each one:
>>
>> # process each identifier
>> my $ua = LWP::UserAgent->new( agent => AGENT );
>> while ( <DATA> ) {
>>
>> # get the identifier
>> chop;
>> my $identifier = $_;
>> print $identifier, "\n";
>>
>> # get its corresponding MARC record
>> my $response = $ua->get( ROOT . "$identifier/$identifier" . "_meta.mrc" );
>> if ( ! $response->is_success ) {
>>
>> warn $response->status_line;
>> next;
>>
>> }
>>
>> # save it
>> open MARC, " > $identifier.mrc" or die "Can't open $identifier.mrc: $!\n";
>> binmode MARC, ":utf8";
>> print MARC $response->content;
>> close MARC;
>>
>> }
>>
>> I then use the venerable marcdump to see the fruits of my labors: marcdump *.mrc. Unfortunately, marcdump returns the following error against (at least) one of my files:
>>
>> bienfaitsducatho00pina.mrc
>> utf8 "\xC3" does not map to Unicode at /System/Library/
>> Perl/5.10.0/darwin-thread-multi-2level/Encode.pm line 162.
>>
>> What is going on here? Am I saving my files incorrectly? Is the original MARC data inherintly incorrect? Is there some way I can fix the MARC record in question?
>>
>> --
>> Eric Lease Morgan
>>
|