Print

Print


Over the next few years, I am tasked to download 30,000 archival masters 
from Internet Archive into an archive for long-term staff access that we may 
preserve with LOCKSS. These are masters of Montana state publications. I 
have a hierarchy in mind to receive these files. The hierarchy is state 
agency\year\title\pub_date\*.pdf.  

I am intending to download the files in batches of 200 - 500 pdfs, but am 
thinking that if I slot them automatically into the archive hierarchy, misplaced 
or missing files could be very hard to find as the total grows. I will be logging 
the downloads, which should give me some control. Are there other strategies 
for ensuring that I can readily correct download errors? I am looking for 
recommendations for the simplest way to maintain reasonable control over the 
download process.