Good point. But since campus IT will be creating regular disaster-recovery backups, the odds that we'd need ever need to retrieve more than a handful of files from Glacier at a time is pretty low. 

Josh Welker

-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Gary McGath
Sent: Friday, January 11, 2013 8:03 AM
To: [log in to unmask]
Subject: Re: [CODE4LIB] Digital collection backups

Concerns have been raised about how expensive Glacier gets if you need to recover a lot of files in a short time period.

On 1/10/13 5:56 PM, Roy Tennant wrote:
> I'd also take a look at Amazon Glacier. Recently I parked about 50GB 
> of data files in logical tar'd and gzip'd chunks and it's costing my 
> employer less than 50 cents/month. Glacier, however, is best for "park 
> it and forget" kinds of needs, as the real cost is in data flow.
> Storage is cheap, but must be considered "offline" or "near line" as 
> you must first request to retrieve a file, wait for about a day, and 
> then retrieve the file. And you're charged more for the download 
> throughput than just about anything.
> I'm using a Unix client to handle all of the heavy lifting of 
> uploading and downloading, as Glacier is meant to be used via an API 
> rather than a web client.[1] If anyone is interested, I have local 
> documentation on usage that I could probably genericize. And yes, I 
> did round-trip a file to make sure it functioned as advertised.
> Roy
> [1]
> On Thu, Jan 10, 2013 at 2:29 PM,  <[log in to unmask]> wrote:
>> We built our own solution for this by creating a plugin that works with our digital asset management system (ResourceSpace) to invidually back up files to Amazon S3. Because S3 is replicated to multiple data centers, this provides a fairly high level of redundancy. And because it's an object-based web service, we can access any given object individually by using a URL related to the original storage URL within our system.
>> This also allows us to take advantage of S3 for images on our website. All of the images from in our online collections database are being served straight from S3, which diverts the load from our public web server. When we launch zoomable images later this year, all of the tiles will also be generated locally in the DAM and then served to the public via the mirrored copy in S3.
>> The current pricing is around $0.08/GB/month for 1-50 TB, which I think is fairly reasonable for what we're getting. They just dropped the price substantially a few months ago.
>> DuraCloud supposedly offers a way to add another abstraction layer so you can build something like this that is portable between different cloud storage providers. But I haven't really looked into this as of yet.

Gary McGath, Professional Software Developer