That's hilarious, that Terry has had to do enough ugliness with Marc encodings that he indeed can recognize 0xC2 off the bat as the Marc8 encoding it represents! I am in awe, as well as sympathy. If the record is in Marc8, then you need to know if Perl Batch::Marc can handle Marc8. If it's supposed to be able to handle it, you need to figure out why it's not. (leader byte says UTF-8 even though it's really Marc8?). If Batch::Marc can't handle Marc8, you need to convert to UTF-8 first. The only software package I know of that can convert from and to Marc8 encoding is Java Marc4J, but I wouldn't be shocked if there was something in Perl to do it. (But yes, as you can tell by the name, "Marc8" is a character encoding ONLY used in Marc, nobody but library people write software for dealing with it). On 4/6/2011 5:01 PM, Reese, Terry wrote: > I'd echo Jonathan's question -- the 0xC2 code is the sound recording marker in MARC-8. I'd guess the file isn't in UTF8. > > --TR > >> -----Original Message----- >> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of >> Jonathan Rochkind >> Sent: Wednesday, April 06, 2011 1:28 PM >> To: [log in to unmask] >> Subject: Re: [CODE4LIB] utf8 "\xC2" does not map to Unicode >> >> I am not familar with that Perl module. But I'm more familiar then I'd want >> with char encoding in Marc. >> >> I don't recognize the bytes 0xC2 (there are some bytes I became pathetically >> familiar with in past debugging, but I've forgotten em), but the first things to >> look at: >> >> 1. Is your Marc file encoded in Marc8 or UTF-8? I'm betting Marc8. >> Theoretically there is a Marc leader byte that tells you whether it's >> Marc8 or UTF-8, but the leader byte is often wrong in real world records. Is it >> wrong? >> >> 2. Does Perl MARC::Batch have a function to convert from Marc8 to >> UTF-8? If so, how does it decide whether to convert? Is it trying to >> do that? Is it assuming that the leader byte the record accurately >> identifies the encoding, and if so, is the leader byte wrong? Is it >> trying to convert from Marc8 to UTF-8, when the source was UTF-8 in the >> first place? Or is it assuming the source was UTF-8 in the first place, when in >> fact it was Marc8? >> >> Not the answer you wanted, maybe someone else will have that. Debugging >> char encoding is hands down the most annoying kind of debugging I ever do. >> >> On 4/6/2011 4:13 PM, Eric Lease Morgan wrote: >>> Ack! While using the venerable Perl MARC::Batch module I get the >> following error while trying to read a MARC record: >>> utf8 "\xC2" does not map to Unicode >>> >>> This is a real pain, and I'm hoping someone here can help me either: 1) trap >> this error allowing me to move on, or 2) figure out how to open the file >> "correctly".