I've finally figured out how to get raw OCR text out of the HathiTrust API, but it is really slow. Any hints out there?

To use the HathiTrust Data API a person needs to first get a couple of access tokens. Applications then need to use the tokens to authenticate. Once this is done, a simple URL can be sent and cool stuff will be returned. For example, the following URL will return the first page of OCR:

By continually incrementing the URL, other pages can be gotten:

By incrementing the URL until an error is returned, one can get the whole of the document. I don't think there is a way to get the whole of the document in one go.

Similarly, a person can get page images:

Again, by incrementing the URL until an error is returned, all the images can be downloaded, and a PDF file could be created.

By combining the traditional reading of a book (PDF) with the text mining of the OCR, very interesting things can take place. Thorough understanding could be obtained.

Unfortunately, continually requesting individual pages seems laborious, not to mention,  s l o w .  It takes ten's of minutes to do the good work.

Attached is the code I use to do the work. Can you suggest ways things could be sped up? Am I missing something when it comes to the API? Maybe if I do the work in a HathiTrust Research Center "capsule" things would be faster? 

Eric Morgan