Eric, I'm planning a similar but not identical bit of work based on the
Open Library. In my case, we are hoping to produce MARC records that
libraries can include in their own catalogs for open access books.
Because this is aimed at public libraries, we are looking for "popular"
reading, as well as "great" reading.
The HathiTrust books, while publicly viewable, are often not
downloadable without a partner log-in. Those exact same books are often
available on the Internet Archive for public download and in a variety
of e-book formats.
On 6/5/13 6:50 AM, Eric Lease Morgan wrote:
> For a good time, I started playing with the HathiTrust Research Center (HTRC) and the Great Books Of The Western World.
> The HTRC is the beginnings of an service providing a computable interface to some of the content of the HathiTrust. A couple of people from the Center came to visit Notre Dame a few weeks ago, and I blogged about the event as well as some of the functionality of the interface. 
> In an effort to learn more about the Center's functionality, I began a mini project. Specifically, I have had a co-worker (Adam McGinn) create a public work set containing the Great Books Of The Western World. I then ran an HTRC algorithm against the set -- the MARC record dumper. After getting the MARC(XML) records I concatenated them into a single file in order to make processing easier. I then wrote a Perl script to read each record, extract rudimentary bibliographic and control information from the data, and output an alphabetical list of the Great Books complete with links to the HathiTrust catalog and full view displays. Data, script, and output are available at http://bit.ly/10Alu81
> Some of the next steps are to download the raw digitized data and do analysis against it. Fun with modern librarianship?
>  blog posting about the HTRC - http://dh.crc.nd.edu/blog/2013/05/htrc/
> Eric Lease Morgan, Digital Initiatives Librarian
> University of Notre Dame
[log in to unmask] http://kcoyle.net