On 18/09/17 21:37, Eric Lease Morgan wrote: > A cool collection of early English print materials is available at the following URL: > https://archive.org/details/bplsceep > > Again, can I programmatically read the contents of a Internet Archive collection? this tool is what you need: https://internetarchive.readthedocs.io/en/latest/ to get a list of all items of the collection: $ ia search -i collection:bplsctpbs > bplsctpbs.txt the txt file contain an identifier on each row $ wc -l bplsctpbs.txt 824 bplsctpbs.txt $ head -n5 bplsctpbs.txt accountofcountri00dobb_0 accountofenglish01lang accountofenglish02lang accountofenglish03lang admirableeuentss00camu then you can have metadata of all items (using parallel https://www.gnu.org/software/parallel/ ) $ parallel ia metadata {} :::: bplsctpbs.txt > all.json -- [log in to unmask]