If that isn't LCSH, then is the entirety of LCSH available electronically in
some capacity (at least available in some easily accessible file or files
that can be processed)?
On Tue, Dec 8, 2009 at 10:16 AM, Karen Coyle <[log in to unmask]> wrote:
> Quoting Keith Jenkins <[log in to unmask]>:
>> The frequency of an LCSH term within the LC catalog could also be
>> useful for ranking, although I'm not sure if such data would be
>> readily available.
> Couple of things: first, what we have at id.loc.gov is NOT LCSH, but a
> copy of the LC subject authority file. The entries in this file form the
> basis for subject headings, most of which add "facets" to the authority
> entry when forming the subject heading. One could do a left-anchored match
> against actual headings, and that might provide some interesting statistics.
> Edward Betts of the Open Library project did some casual data gathering for
> subjects, and posted his "top 1000" subject headings (not subject
> The OL has decided to break up the subject headings into their subfields,
> and somewhere there are some pages that show some subfields with the highest
> ranking subfields they appear with. (There must be a better way to say that!
> Sorry, too early, too few cups of tea.) One example is here:
> I think that something like this will be incorporated into the next version
> of OL, which will be heavily navigation-oriented rather than
> p.s. Anyone who wants to play with a file can grab the OL data export:
> Unfortunately it includes both LC and non-LC subjects (mainly BISAC from
>> Another possibility would be a simple count of broader terms +
>> narrower terms + related terms or something like that. Although
>> PageRank would probably be better, since even some "important" terms
>> might have a relatively small number of immediately-adjacent links.
> Karen Coyle
> [log in to unmask] http://kcoyle.net
> ph: 1-510-540-7596
> m: 1-510-435-8234
> skype: kcoylenet