LISTSERV 16.5 - CODE4LIB Archives

I am not familar with that Perl module. But I'm more familiar then I'd 
want with char encoding in Marc.

I don't recognize the bytes 0xC2 (there are some bytes I became 
pathetically familiar with in past debugging, but I've forgotten em), 
but the first things to look at:

1. Is your Marc file encoded in Marc8 or UTF-8?  I'm betting Marc8. 
Theoretically there is a Marc leader byte that tells you whether it's 
Marc8 or UTF-8, but the leader byte is often wrong in real world 
records.  Is it wrong?

2. Does Perl MARC::Batch  have a function to convert from Marc8 to 
UTF-8?   If so, how does it decide whether to convert? Is it trying to 
do that?  Is it assuming that the leader byte the record accurately 
identifies the encoding, and if so, is the leader byte wrong?   Is it 
trying to convert from Marc8 to UTF-8, when the source was UTF-8 in the 
first place?  Or is it assuming the source was UTF-8 in the first place, 
when in fact it was Marc8?

Not the answer you wanted, maybe someone else will have that. Debugging 
char encoding is hands down the most annoying kind of debugging I ever do.

On 4/6/2011 4:13 PM, Eric Lease Morgan wrote:
> Ack! While using the venerable Perl MARC::Batch module I get the following error while trying to read a MARC record:
>
>    utf8 "\xC2" does not map to Unicode
>
> This is a real pain, and I'm hoping someone here can help me either: 1) trap this error allowing me to move on, or 2) figure out how to open the file "correctly".
>