Print

Print


On May 27, 2015, at 6:33 PM, Karen Coyle <[log in to unmask]> wrote:

>> In my copious spare time I have hacked together a thing I’m calling the HathiTrust Research Center Workset Browser, a (fledgling) tool for doing “distant reading” against corpora from the HathiTrust. [0, 1] ...
>> 
>> 'Want to give it a try? For a limited period of time, go to the HathiTrust Research Center Portal, create (refine or identify) a collection of personal interest, use the Algorithms tool to export the collection's rsync file, and send the file to me. I will feed the rsync file to the Browser, and then send you the URL pointing to the results.
>> 
>> [0] introduction in a blog posting - http://ntrda.me/1FUGP2g
>> [1] HTRC Workset Browser - http://bit.ly/workset-browser
> 
> Eric, what happens if you access this from a non-HT institution? When I go to HT I am often unable to download public domain titles because they aren't available to members of the general public.


The short answer is, “Nothing”.

The long answer is… longer. The HathiTrust proper is accessible to anybody, but the downloading of public domain content is only available to subscribing institutions.

On the other hand, the “Workset Browser” is designed to work off the HathiTrust Research Center Portal, not the HathiTrust proper. The Portal is located at http://sharc.hathitrust.org From there anybody can search the collection of public domain content, create collections, and apply various algorithms against collections. One of the algorithms is “create RSYNC file” which, in turn, allows you to download bunches o’ metadata describing the items in your collection. (There is also a “download as MARC” algorithm.) This rsync file is the root of the Workset Browser. Feed the Browser a rsync file, and the Browser will mirror content locally, index it, and generate reports describing the collection. 

Thank you for asking. Many people do not know there is a HathiTrust Research Center.

—
Eric Morgan