On Feb 10, 2019, at 8:50 PM, Eric Lease Morgan <[log in to unmask]> wrote:

> I've finally figured out how to get raw OCR text out of the HathiTrust API, but it is really slow. Any hints out there?...

In a fit of creativity, I hacked together some Bash/Python scripts to programmatically download plain (OCR) text as well as PDF files from the HathiTrust. Here is synopsis on how to use them:

  Given an access key, secret token, and a HathiTrust identifier,
  output plain text as well as PDF versions of a book.

  $ ./bin/ <token> <key> <identifier>
  $ ./bin/ <token> <key> <identifier> <length>
  $ ./bin/ <token> <key> <identifier>
  $ ./bin/ <token> <key> <tsv>

The process is not fast but very functional. For more detail, see the GitHub repository -->  We now return you to the regularly scheduled programming.

Eric Morgan
University of Notre Dame