Print

Print


Quoting Keith Jenkins <[log in to unmask]>:

>
> The frequency of an LCSH term within the LC catalog could also be
> useful for ranking, although I'm not sure if such data would be
> readily available.

Couple of things: first, what we have at id.loc.gov is NOT LCSH, but a  
copy of the LC subject authority file. The entries in this file form  
the basis for subject headings, most of which add "facets" to the  
authority entry when forming the subject heading. One could do a  
left-anchored match against actual headings, and that might provide  
some interesting statistics.

Edward Betts of the Open Library project did some casual data  
gathering for subjects, and posted his "top 1000" subject headings  
(not subject authorities):
http://edwardbetts.com/ol/top_1000_subjects
The OL has decided to break up the subject headings into their  
subfields, and somewhere there are some pages that show some subfields  
with the highest ranking subfields they appear with. (There must be a  
better way to say that! Sorry, too early, too few cups of tea.) One  
example is here:
http://home.us.archive.org/~edward/related/Cheese.html
I think that something like this will be incorporated into the next  
version of OL, which will be heavily navigation-oriented rather than  
search-oriented.

kc
p.s. Anyone who wants to play with a file can grab the OL data export:

http://openlibrary.org/dev/docs/jsondump

Unfortunately it includes both LC and non-LC subjects (mainly BISAC  
from Amazon)

>
> Another possibility would be a simple count of broader terms +
> narrower terms + related terms or something like that.  Although
> PageRank would probably be better, since even some "important" terms
> might have a relatively small number of immediately-adjacent links.
>
> Keith
>

-- 
Karen Coyle
[log in to unmask] http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet