Hi Tim (and apologies to everyone for being so chatty on this topic)
After your post, I've had your question in the back of my mind while
working on other things -- the problem of identifying/improving low quality
data is interesting given our growing reliance on publisher data.
References to the author by name and/or the reader in the 520 have been
reliable indicators of promotional rather than summary content in records
that have crossed my path since your post.
On Thu, Sep 19, 2019 at 9:50 AM Tim Spalding <[log in to unmask]> wrote:
> [I also put this on AUTOCAT. Apologies if you also follow that. This
> falls at the intersection of hand-cataloging, data processing and
> simple AI.]
> I wonder if anyone has thoughts on the best way to identify the source
> of summary/description data (520s) across a large corpus of MARC
> My primary goal is to distinguish between more neutral,
> librarian-written summaries, and the more promotional summaries
> derived from publishers sources, whether typed in from flap copy or
> produced by ONIX-MARC conversion. I can see a number of uses for this
> distinction; one is that members of LibraryThing much prefer short,
> neutral descriptions, and abhor the lengthy purple prose of many
> publisher descriptions.
> So far I have a few ideas, but I'd love your thoughts on more:
> The 520 $c (and $u, $2) ought to have source information. But it's
> rarely filled out. Are there any other "tells"?