I was going to try to reduce the space a bit by focusing on 650 fields. Each record with a Dewey number will be a tab separated line, that will include each 650 field in order. So something like:
305.42/0973 <tab> Women's rights -- United States -- History -- Sources. <tab> Women -- United States -- History — Sources <tab> Manuscripts, American -- Facsimiles.
I thought it might be a place to start at least … it’s running on an ec2 instance right now :-)
//Ed
On Dec 10, 2013, at 4:26 PM, Karen Coyle <[log in to unmask]> wrote:
> I've often thought that this would be an interesting exercise if someone would undertake it.
>
> Just a reminder: in theory (IN THEORY) the first subject heading in an LC record is the one most semantically close to the assigned subject classification. So perhaps a first pass with the FIRST 6xx might give a more refined matching. And then it would be interesting to compare that with the results using all 600-651's.
>
> kc
>
> On 12/10/13, 1:18 PM, Edward Summers wrote:
>> Not a naive idea at all. If you have the stomach for it, you could extract the Subject Heading / Dewey combinations out of say the LC Catalog MARC data [1] to use as training data for some kind of clustering [2] algorithm. You might even be able to do something simple like keep a count of the Dewey ranges associated with each subject heading.
>>
>> I’m kind of curious myself, so I could work on getting the subject heading / dewey combinations if you want?
>>
>> //Ed
>>
>> [1] https://archive.org/details/marc_records_scriblio_net
>> [2] https://en.wikipedia.org/wiki/Cluster_analysis
>>
>> On Dec 10, 2013, at 8:18 AM, Irina Arndt <[log in to unmask]> wrote:
>>
>>> Hi CODE4LIB,
>>>
>>> we would like to add DDC classes to a bunch of MARC records, which contains only LoC Subject Headings.
>>> Does anybody know, if a mapping between LCSH and DDC is anywhere existent (and available)?
>>>
>>> I understood, that WebDewey http://www.oclc.org/dewey/versions/webdewey.en.html might provide such a service, but
>>>
>>> · we are no OCLC customers or subscribers to WebDewey
>>>
>>> · even if we were, I'm not sure, if the service matches our needs
>>>
>>> I'm thinking of a tool, where I can upload my list of subject headings and get back a list, where the matching Dewey classes have been added (but a 'simple' csv file with LCSH terms and DDC classes would be helpful as well- I am fully aware, that neither LCSH nor DDC are simple at all...) . Naïve idea...?
>>>
>>> Thanks for any clues,
>>> Irina
>>>
>>>
>>> -------
>>>
>>> Irina Arndt
>>> Max Planck Digital Library (MPDL)
>>> Library System Coordinator
>>> Amalienstr. 33
>>> D-80799 Muenchen, Germany
>>>
>>> Tel. +49 89 38602-254
>>> Fax +49 89 38602-290
>>>
>>> Email: [log in to unmask]<mailto:[log in to unmask]>
>>> http://www.mpdl.mpg.de
>
> --
> Karen Coyle
> [log in to unmask] http://kcoyle.net
> m: 1-510-435-8234
> skype: kcoylenet
|