I was going to try to reduce the space a bit by focusing on 650 fields. Each record with a Dewey number will be a tab separated line, that will include each 650 field in order. So something like: 305.42/0973 <tab> Women's rights -- United States -- History -- Sources. <tab> Women -- United States -- History — Sources <tab> Manuscripts, American -- Facsimiles. I thought it might be a place to start at least … it’s running on an ec2 instance right now :-) //Ed On Dec 10, 2013, at 4:26 PM, Karen Coyle <[log in to unmask]> wrote: > I've often thought that this would be an interesting exercise if someone would undertake it. > > Just a reminder: in theory (IN THEORY) the first subject heading in an LC record is the one most semantically close to the assigned subject classification. So perhaps a first pass with the FIRST 6xx might give a more refined matching. And then it would be interesting to compare that with the results using all 600-651's. > > kc > > On 12/10/13, 1:18 PM, Edward Summers wrote: >> Not a naive idea at all. If you have the stomach for it, you could extract the Subject Heading / Dewey combinations out of say the LC Catalog MARC data [1] to use as training data for some kind of clustering [2] algorithm. You might even be able to do something simple like keep a count of the Dewey ranges associated with each subject heading. >> >> I’m kind of curious myself, so I could work on getting the subject heading / dewey combinations if you want? >> >> //Ed >> >> [1] https://archive.org/details/marc_records_scriblio_net >> [2] https://en.wikipedia.org/wiki/Cluster_analysis >> >> On Dec 10, 2013, at 8:18 AM, Irina Arndt <[log in to unmask]> wrote: >> >>> Hi CODE4LIB, >>> >>> we would like to add DDC classes to a bunch of MARC records, which contains only LoC Subject Headings. >>> Does anybody know, if a mapping between LCSH and DDC is anywhere existent (and available)? >>> >>> I understood, that WebDewey http://www.oclc.org/dewey/versions/webdewey.en.html might provide such a service, but >>> >>> · we are no OCLC customers or subscribers to WebDewey >>> >>> · even if we were, I'm not sure, if the service matches our needs >>> >>> I'm thinking of a tool, where I can upload my list of subject headings and get back a list, where the matching Dewey classes have been added (but a 'simple' csv file with LCSH terms and DDC classes would be helpful as well- I am fully aware, that neither LCSH nor DDC are simple at all...) . Naïve idea...? >>> >>> Thanks for any clues, >>> Irina >>> >>> >>> ------- >>> >>> Irina Arndt >>> Max Planck Digital Library (MPDL) >>> Library System Coordinator >>> Amalienstr. 33 >>> D-80799 Muenchen, Germany >>> >>> Tel. +49 89 38602-254 >>> Fax +49 89 38602-290 >>> >>> Email: [log in to unmask]<mailto:[log in to unmask]> >>> http://www.mpdl.mpg.de > > -- > Karen Coyle > [log in to unmask] http://kcoyle.net > m: 1-510-435-8234 > skype: kcoylenet