On Jan 23, 2009, at 5:52 AM, Eric Lease Morgan wrote:
> On 1/23/09 4:39 AM, "Brown, Alan" <[log in to unmask]> wrote:
>
>>> Does anybody here know the difference between MARC21 and USMARC?
>>>
>>> I am munging sets of MARC bibliographic data from a III catalog with
>>> holdings data from the same. I am using MARC::Batch to read my bib'
>>> data (with both strict and warnings turned off), insert 853 and 863
>>> fields, and writing the data using the as_usmarc method.
>>> Therefore, I
>>> think I am creating USMARC files. I can then use marcdump to... dump
>>> the records. It returns 0 errors.
>>
>> Eric, This isn't an encoding thing is it? I know that a number of III
>> catalogues still encode their diacritics using the MARC8 version of
>> USMARC. We have changed ours to Unicode now, but we did have an
>> issue of
>> the catalogue outputting unicode records that weren't tagged as
>> such in
>> the leader and so couldn't be identified as proper MARC21 (current
>> version of USMARC). III have solved this with their latest release.
>> This
>> issue had me scratching my head with a lot of my MARC::Record
>> scripts,
>> but generally they failed quite spectacularly.
>
>
> Actually, I believe I am suffering from a number of different types of
> errors in my MARC data: 1) encoding issues (MARC8 versus UTF-8), 2)
> syntactical errors (lack of periods, invalid choices of indicators,
> etc.),
> 3) incorrect data types (strings entered into fields denoted for
> integers,
> etc.) Just about the only thing I haven't encountered are structural
> errors
> such as invalid leader, and this doesn't even take into account
> possible
> data entry errors (author is Franklin when Twain was entered).
>
> Yes, I do have an encoding issue. All of my incoming records are in
> MARC8.
> I'm not sure, but I think the Primo tool expects UTF-8. I can easily
> update
> the encoding bit (change leader position 09 from blank to a), but
> this does
> not change any actual encoding in the bibliographic section of my
> data.
> Consequently, after updating the encoding bit and looping through my
> munged
> data MARC::Record chokes on records with the following error where
> UTF-8 is
> denoted but include MARC8 characters:
>
> utf8 "\xE8" does not map to Unicode at
> /usr/lib/perl5/5.8.8/i686-linux/Encode.pm line 166.
>
> Upon looking at the raw MARC see the the offending record includes
> the word
> Münich. What can I do to transform MARC8 data into UTF-8? What can I
> do to
> trap the error above, and skip these invalid records?
We've had good luck with the yaz-marcdump utility that's included with
the YAZ toolkit. We're using it to convert our exported Horizon
records from MARC8 to UTF-8 before we import into AquaBrowser. The
tool is easy to compile, blindingly fast, forgiving of common MARC
errors, and changes the coding correctly. It's been serving us well.
-Tod
Tod Olson <[log in to unmask]>
Systems Librarian
University of Chicago Library
|