Hi there, 

There’s a python library that I’ve had good luck with ( ). It comes with a command line tool, ia ( ), that will easily do what you’d like. 

“ ia search ‘collection:bplsceep’ “ will return the list of Internet Archive identifiers for that collection. 


Katie Mika
Biodiversity Heritage Library NDSR Resident
Ernst Mayr Library, Museum of Comparative Zoology, Harvard University
[log in to unmask] | 281-384-5789

On 9/18/17, 3:37 PM, "Code for Libraries on behalf of Eric Lease Morgan" <[log in to unmask] on behalf of [log in to unmask]> wrote:

    Is there an Internet Archive API that will allow me to get the contents of a collection as a stream of data and not as a stream of HTML.
    A cool collection of early English print materials is available at the following URL:
    Each item is associated with an Internet Archive identifier. If I were able to easily extract these identifiers, then I would be more easily able to provide services based on the collection. But I’m lazy. I don’t want to read the HTML and scrape it accordingly. Ick! I’d rather be given the list of bibliographics in a more computer-friendly way.
    Again, can I programmatically read the contents of a Internet Archive collection?
    Eric Morgan