LISTSERV 16.5 - CODE4LIB Archives

Glacier sounds even better than S3 for what we're looking for. We are only going to be retrieving the files in the case of corruption, so the pay-per-retrieval model would work well. I heard of Glacier in the past but forgot all about it. Thank you.

Josh Welker


-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Roy Tennant
Sent: Thursday, January 10, 2013 4:56 PM
To: [log in to unmask]
Subject: Re: [CODE4LIB] Digital collection backups

I'd also take a look at Amazon Glacier. Recently I parked about 50GB of data files in logical tar'd and gzip'd chunks and it's costing my employer less than 50 cents/month. Glacier, however, is best for "park it and forget" kinds of needs, as the real cost is in data flow.
Storage is cheap, but must be considered "offline" or "near line" as you must first request to retrieve a file, wait for about a day, and then retrieve the file. And you're charged more for the download throughput than just about anything.

I'm using a Unix client to handle all of the heavy lifting of uploading and downloading, as Glacier is meant to be used via an API rather than a web client.[1] If anyone is interested, I have local documentation on usage that I could probably genericize. And yes, I did round-trip a file to make sure it functioned as advertised.
Roy

[1] https://github.com/vsespb/mt-aws-glacier

On Thu, Jan 10, 2013 at 2:29 PM,  <[log in to unmask]> wrote:
> We built our own solution for this by creating a plugin that works with our digital asset management system (ResourceSpace) to invidually back up files to Amazon S3. Because S3 is replicated to multiple data centers, this provides a fairly high level of redundancy. And because it's an object-based web service, we can access any given object individually by using a URL related to the original storage URL within our system.
>
> This also allows us to take advantage of S3 for images on our website. All of the images from in our online collections database are being served straight from S3, which diverts the load from our public web server. When we launch zoomable images later this year, all of the tiles will also be generated locally in the DAM and then served to the public via the mirrored copy in S3.
>
> The current pricing is around $0.08/GB/month for 1-50 TB, which I think is fairly reasonable for what we're getting. They just dropped the price substantially a few months ago.
>
> DuraCloud http://www.duracloud.org/ supposedly offers a way to add another abstraction layer so you can build something like this that is portable between different cloud storage providers. But I haven't really looked into this as of yet.
>
> -David
>
>
> __________
>
> David Dwiggins
> Systems Librarian/Archivist, Historic New England
> 141 Cambridge Street, Boston, MA 02114
> (617) 994-5948
> [log in to unmask]
> http://www.historicnewengland.org
>>>> Joshua Welker <[log in to unmask]> 1/10/2013 5:20 PM >>>
> Hi everyone,
>
> We are starting a digitization project for some of our special collections, and we are having a hard time setting up a backup system that meets the long-term preservation needs of digital archives. The backup mechanisms currently used by campus IT are short-term full-server backups. What we are looking for is more granular, file-level backup over the very long term. Does anyone have any recommendations of software or some service or technique? We are looking into LOCKSS but haven't dug too deeply yet. Can anyone who uses LOCKSS tell me a bit of their experiences with it?
>
> Josh Welker
> Electronic/Media Services Librarian
> College Liaison
> University Libraries
> Southwest Baptist University
> 417.328.1624