I am not familar with that Perl module. But I'm more familiar then I'd
want with char encoding in Marc.
I don't recognize the bytes 0xC2 (there are some bytes I became
pathetically familiar with in past debugging, but I've forgotten em),
but the first things to look at:
1. Is your Marc file encoded in Marc8 or UTF-8? I'm betting Marc8.
Theoretically there is a Marc leader byte that tells you whether it's
Marc8 or UTF-8, but the leader byte is often wrong in real world
records. Is it wrong?
2. Does Perl MARC::Batch have a function to convert from Marc8 to
UTF-8? If so, how does it decide whether to convert? Is it trying to
do that? Is it assuming that the leader byte the record accurately
identifies the encoding, and if so, is the leader byte wrong? Is it
trying to convert from Marc8 to UTF-8, when the source was UTF-8 in the
first place? Or is it assuming the source was UTF-8 in the first place,
when in fact it was Marc8?
Not the answer you wanted, maybe someone else will have that. Debugging
char encoding is hands down the most annoying kind of debugging I ever do.
On 4/6/2011 4:13 PM, Eric Lease Morgan wrote:
> Ack! While using the venerable Perl MARC::Batch module I get the following error while trying to read a MARC record:
>
> utf8 "\xC2" does not map to Unicode
>
> This is a real pain, and I'm hoping someone here can help me either: 1) trap this error allowing me to move on, or 2) figure out how to open the file "correctly".
>
|