Quoting Keith Jenkins <[log in to unmask]>:
>
> The frequency of an LCSH term within the LC catalog could also be
> useful for ranking, although I'm not sure if such data would be
> readily available.
Couple of things: first, what we have at id.loc.gov is NOT LCSH, but a
copy of the LC subject authority file. The entries in this file form
the basis for subject headings, most of which add "facets" to the
authority entry when forming the subject heading. One could do a
left-anchored match against actual headings, and that might provide
some interesting statistics.
Edward Betts of the Open Library project did some casual data
gathering for subjects, and posted his "top 1000" subject headings
(not subject authorities):
http://edwardbetts.com/ol/top_1000_subjects
The OL has decided to break up the subject headings into their
subfields, and somewhere there are some pages that show some subfields
with the highest ranking subfields they appear with. (There must be a
better way to say that! Sorry, too early, too few cups of tea.) One
example is here:
http://home.us.archive.org/~edward/related/Cheese.html
I think that something like this will be incorporated into the next
version of OL, which will be heavily navigation-oriented rather than
search-oriented.
kc
p.s. Anyone who wants to play with a file can grab the OL data export:
http://openlibrary.org/dev/docs/jsondump
Unfortunately it includes both LC and non-LC subjects (mainly BISAC
from Amazon)
>
> Another possibility would be a simple count of broader terms +
> narrower terms + related terms or something like that. Although
> PageRank would probably be better, since even some "important" terms
> might have a relatively small number of immediately-adjacent links.
>
> Keith
>
--
Karen Coyle
[log in to unmask] http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
|