Print

Print


For a good time, I started playing with the HathiTrust Research Center (HTRC) and the Great Books Of The Western World.

The HTRC is the beginnings of an service providing a computable interface to some of the content of the HathiTrust. A couple of people from the Center came to visit Notre Dame a few weeks ago, and I blogged about the event as well as some of the functionality of the interface. [0]

In an effort to learn more about the Center's functionality, I began a mini project. Specifically, I have had a co-worker (Adam McGinn) create a public work set containing the Great Books Of The Western World. I then ran an HTRC algorithm against the set -- the MARC record dumper. After getting the MARC(XML) records I concatenated them into a single file in order to make processing easier. I then wrote a Perl script to read each record, extract rudimentary bibliographic and control information from the data, and output an alphabetical list of the Great Books complete with links to the HathiTrust catalog and full view displays. Data, script, and output are available at http://bit.ly/10Alu81

Some of the next steps are to download the raw digitized data and do analysis against it. Fun with modern librarianship?

[0] blog posting about the HTRC - http://dh.crc.nd.edu/blog/2013/05/htrc/

--
Eric Lease Morgan, Digital Initiatives Librarian
University of Notre Dame