Over the next few years, I am tasked to download 30,000 archival masters
from Internet Archive into an archive for long-term staff access that we may
preserve with LOCKSS. These are masters of Montana state publications. I
have a hierarchy in mind to receive these files. The hierarchy is state
agency\year\title\pub_date\*.pdf.
I am intending to download the files in batches of 200 - 500 pdfs, but am
thinking that if I slot them automatically into the archive hierarchy, misplaced
or missing files could be very hard to find as the total grows. I will be logging
the downloads, which should give me some control. Are there other strategies
for ensuring that I can readily correct download errors? I am looking for
recommendations for the simplest way to maintain reasonable control over the
download process.
|