Thanks for bringing up the issue of the cost of making sure the data is consistent. We will be using DSpace for now, and I know DSpace has some checksum functionality built in out-of-the-box. It shouldn't be too difficult to write a script that loops through DSpace's checksum data and compares it against the files in Glacier. Reading the Glacier FAQ on Amazon's site, it looks like they provide an archive inventory (updated daily) that can be downloaded as JSON. I read some users saying that this inventory includes checksum data. So hopefully it will just be a matter of comparing the local checksum to the Glacier checksum, and that would be easy enough to script.
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Ryan Eby
Sent: Friday, January 11, 2013 11:37 AM
To: [log in to unmask]
Subject: Re: [CODE4LIB] Digital collection backups
As Aaron alludes to your decision should base off your real needs and they might not be exclusive.
LOCKSS/MetaArchive might be worth the money if it is the community archival aspect you are going for. Depending on your institution being a participant might make political/mission sense regardless of the storage needs and it could just be a specific collection that makes sense.
Glacier is a great choice if you are looking for spreading a backup across regions. S3 similarly if you also want to benefit from CloudFront (the CDN
setup) to take load off your institutions server (you can now use cloudfront off your own origin server as well). Depending on your bandwidth this might be worth the money regardless of LOCKSS participation (which can be more dark). Amazon also tends to be dropping prices over time vs raising but as any outsource you have to plan that it might not exist in the future. Also look more at Glacier prices in terms of checking your data for consistency. There have been a few papers on the costs of making sure Amazon really has the proper data depending on how often your requirements want you to check.
Another option if you are just looking for more geo placement is finding an institution or service provider that will colocate. There may be another small institution that would love to shove a cheap box with hard drives on your network in exchange for the same. Not as involved/formal as LOCKSS but gives you something you control to satisfy your requirements. It could also be as low tech as shipping SSDs to another institution who then runs some bagit checksums on the drive, etc.
All of the above should be scriptable in your workflow. Just need to decide what you really want out of it.
On Fri, Jan 11, 2013 at 11:52 AM, Aaron Trehub <[log in to unmask]> wrote:
> Hello Josh,
> Auburn University is a member of two Private LOCKSS Networks: the
> MetaArchive Cooperative and the Alabama Digital Preservation Network
> (ADPNet). Here's a link to a recent conference paper that describes
> both networks, including their current pricing structures:
> LOCKSS has worked well for us so far, in part because supporting
> community-based solutions is important to us. As you point out,
> however, Glacier is an attractive alternative, especially for
> institutions that may be more interested in low-cost, low-throughput
> storage and less concerned about entrusting their content to a
> commercial outfit or having to pay extra to get it back out. As with
> most things, you pay your money--more or less, depending--and make your choice. And take your risks.
> Good luck with whatever solution(s) you decide on. They need not be
> mutually exclusive.
> Aaron Trehub
> Assistant Dean for Technology and Technical Services Auburn University
> 231 Mell Street, RBD Library
> Auburn, AL 36849-5606
> Phone: (334) 844-1716
> Skype: ajtrehub
> E-mail: [log in to unmask]
> URL: http://lib.auburn.edu/