On Fri, Jan 27, 2012 at 6:26 PM, Roy Tennant <[log in to unmask]> wrote:
> Oh, I should have also mentioned that some of the worst problems occur
> when people treat their metadata like it will never leave their
> institution. When that happens you get all kinds of crazy cruft in a
> record. For example, just off the top of my head:
> * Embedded HTML markup (one of my favorites is an <img> tag)
> * URLs to remote resources that are hard-coded to go through a
> particular institution's proxy
> * Notes that only have meaning for that institution
> * Text that is meant to display to the end-user but may only do so in
> certain systems; e.g., "Click here" in a particular subfield.
> On Fri, Jan 27, 2012 at 4:17 PM, Roy Tennant <[log in to unmask]> wrote:
> > Thanks a lot for the kind shout-out Leslie. I have been pondering what
> > I might propose to discuss at this event, since there is certainly
> > plenty of fodder. Recently we (OCLC Research) did an investigation of
> > 856 fields in WorldCat (some 40 million of them) and that might prove
> > interesting. By the time ALA rolls around there may something else
> > entirely I could talk about.
> > That's one of the wonderful things about having 250 million MARC
> > records sitting out on a 32-node cluster. There are any number of
> > potentially interesting investigations one could do.
> > Roy
> > On Thu, Jan 26, 2012 at 2:10 PM, Johnston, Leslie <[log in to unmask]>
> >> Roy's fabulous "Bitter Harvest" paper:
> >> -----Original Message-----
> >> From: Code for Libraries [mailto:[log in to unmask]] On Behalf
> Of Walter Lewis
> >> Sent: Wednesday, January 25, 2012 1:38 PM
> >> To: [log in to unmask]
> >> Subject: Re: [CODE4LIB] Metadata war stories...
> >> On 2012-01-25, at 10:06 AM, Becky Yoose wrote:
> >>> - Dirty data issues when switching discovery layers or using
> >>> legacy/vendor metadata (ex. HathiTrust)
> >> I have a sharp recollection of a slide in a presentation Roy Tennant
> offered up at Access (at Halifax, maybe), where he offered up a range of
> dates extracted from an array of OAI harvested records. The good, the bad,
> the incomprehensible, the useless-without-context (01/02/03 anyone?) and on
> and on. In my years of migrating data, I've seen most of those variants.
> (except ones *intended* to be BCE).
> >> Then there are the fielded data sets without authority control. My
> favourite example comes from staff who nominally worked for me, so I'm not
> telling tales out of school. The classic Dynix product had a Newspaper
> index module that we used before migrating it (PICK migrations; such a
> joy). One title had twenty variations on "Georgetown Independent" (I wish
> I was kidding) and the dates ranged from the early ninth century until
> nearly the 3rd millenium. (apparently there hasn't been much change in
> local council over the centuries).
> >> I've come to the point where I hand-walk the spatial metadata to links
> with to geonames.org for the linked open data. Never had to do it for a
> set with more than 40,000 entries though. The good news is that it isn't
> hard to establish a valid additional entry when one is required.
> >> Walter