Print

Print


On Wed, Jul 7, 2010 at 7:00 PM, Doran, Michael D <[log in to unmask]> wrote:
> Of course, subfield $3 values are not any kind of controlled vocabulary, so it's hard to do much with them programmatically.

A few years ago I analyzed the subfield 3 values in the Library of
Congress data up at the Internet Archive [1]. Of course it's really
simple to extract, but I just pushed it up to GitHub, mainly to share
the results [2].

I extracted all the subfield 3 values from the 12M? records, and then
counted them up to see how often they repeated [3]. As you can see
it's hardly controlled, but it might be worthwhile coming up with some
simple heuristics and properties for the familiar ones: you could
imagine dcterms:description being used for "Publisher description",
etc.

Of course the $3 in your catalog data might be different from LCs, but
maybe we could come up with a list of common ones on a wiki somewhere,
and publish a little vocabulary that covered the important relations?

//Ed

[1] http://www.archive.org/details/marc_records_scriblio_net
[2] http://github.com/edsu/beat
[3] http://github.com/edsu/beat/raw/master/types.txt