LISTSERV 16.5 - CODE4LIB Archives

I'm out of my depth here, but I'm curious how this all works. Is it true
that, in MARC8 records, there is supposed to be an 066 field included that
defines non-Latin character sets? I'm making this conclusion from some
things I read on the LOC website. ANSEL is mentioned as one of the
instances where this might be necessary.

http://www.loc.gov/marc/specifications/speccharucs.html#field066
http://www.loc.gov/marc/specifications/speccharconversion.html#escape
http://www.loc.gov/marc/bibliographic/bd066.html


On Thu, Mar 8, 2012 at 1:02 PM, Godmar Back <[log in to unmask]> wrote:

> Hi,
>
> a few days ago, I showed pymarc to a group of technical librarians to
> demonstrate how easily certain tasks can be scripted/automated.
>
> Unfortunately, it blew up at me when I tried to write a record:
>
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 9:
> ordinal not in range(128)
>
> Investigation revealed this culprit:
>
> =LDR  00916nam a2200241I  4500
> =001  ocm10685946
> =005  19880203211447.0
> =007  cr\bn||||||abp
> =007  cr\bn||||||cda
> =008  840503s1939\\\\gw\\\\\\\\\\\\00010\ger\d
> =040  \\$aMBB$cMBB$dCRL
> =049  \\$aCRLL
> =100  10$aEsser, Hermann,$d1900-
> =245  14$aDie j<E8>udischer Weltpest ;$bjudend<E1>ammerung auf dem
> Erdball,$cvon Hermann Esser.
> =260  0\$aM<E8>unchen,$bZentralverlag der N S D A P., F. Eher
> ahchf.,$c1939.
> =300  \\$a243 [1] p.$c23 cm.
> =533  \\$aAlso available as electronic reproduction.$bChicago :$cCenter for
> Research Libraries,$d[2009]
> =650  \0$aJewish question.
> =700  12$aBierbrauer, Johann Jacob,$d1705-1760?
> =710  2\$aCenter for Research Libraries (U.S.)
> =856  41$uhttp://dds.crl.edu/CRLdelivery.asp?tid=10538$zOnline version
> =907  \\$a.b28931622$b08-30-10$c08-30-10
> =998  \\$awww$b08-30-10$cm$dz$e-$fger$ggw $h4$i0
>
> The leader[9] field is set to 'a', so the record should contain
> UTF8-encoded Unicode [1], but E8 75 in the 245$a appears to be ANSEL where
> 'E8' denotes the Umlaut preceding the lowercase 'u' (0x75). [2]
>
> To me, this record looks misencoded... am I correct here? There are
> thousands of such records in the data set I'm dealing with, which was
> obtained using the 'Data Exchange' feature of III's Millennium system.
>
> My question is how others, especially pymarc users dealing with III
> records, deal with this issue or whatever other
> experiences/hints/practices/kludges exist in this area.
>
> Thanks.
>
>  - Godmar
>
> [1] http://www.loc.gov/marc/bibliographic/bdleader.html
> [2] http://lcweb2.loc.gov/diglib/codetables/45.html
>