LISTSERV 16.5 - CODE4LIB Archives

Thanks, Al. I think we'd join a LOCKSS network rather than run multiple LOCKSS boxes ourselves. Does anyone have any experience with one of those, like the LOCKSS Global Alliance?

Josh Welker


-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Al Matthews
Sent: Friday, January 11, 2013 8:50 AM
To: [log in to unmask]
Subject: Re: [CODE4LIB] Digital collection backups

We use LOCKSS as part of MetaArchive. LOCKSS as I understand it is typically spec-d for consumer hardware, and so, presumably as a result of SE Asia flooding, there have been some drive failures and cache downtimes and adjustments accordingly.

However, that is the worst of it, first.

LOCKSS is to some perhaps even considerable degree, tamper-resistant since it relies on mechanisms of collective polling among multiple copies to preserve integrity. This, as opposed to static checksums or some other solution.

As such, it seems to me important to run a LOCKSS box with other LOCKSS boxes; MA cooperative specifies six or so, distributed locations for each cache.

The economic sustainability of such an enterprise is a valid question.
David S H Rosenthal at Stanford seems to lead the charge for this research.

e.g. http://blog.dshr.org/2012/08/amazons-announcement-of-glacier.html#more

I've heard mention from other players that they watch MA carefully for such sustainability considerations, especially because MA uses LOCKSS for non-journal content. In some sense this may extend LOCKSS beyond its original design.

MetaArchive has in my opinion been extremely responsible in designating succession scenarios and disaster recovery scenarios, going to far as to fund, develop and test services for migration out of the system, into an IRODS repository in the initial case.


Al Matthews
AUC Robert W. Woodruff Library

On 1/11/13 9:10 AM, "Joshua Welker" <[log in to unmask]> wrote:

>Good point. But since campus IT will be creating regular 
>disaster-recovery backups, the odds that we'd need ever need to 
>retrieve more than a handful of files from Glacier at a time is pretty low.
>
>Josh Welker
>
>
>-----Original Message-----
>From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of 
>Gary McGath
>Sent: Friday, January 11, 2013 8:03 AM
>To: [log in to unmask]
>Subject: Re: [CODE4LIB] Digital collection backups
>
>Concerns have been raised about how expensive Glacier gets if you need 
>to recover a lot of files in a short time period.
>
>http://www.wired.com/wiredenterprise/2012/08/glacier/
>
>On 1/10/13 5:56 PM, Roy Tennant wrote:
>> I'd also take a look at Amazon Glacier. Recently I parked about 50GB 
>> of data files in logical tar'd and gzip'd chunks and it's costing my 
>> employer less than 50 cents/month. Glacier, however, is best for 
>> "park it and forget" kinds of needs, as the real cost is in data flow.
>> Storage is cheap, but must be considered "offline" or "near line" as 
>> you must first request to retrieve a file, wait for about a day, and 
>> then retrieve the file. And you're charged more for the download 
>> throughput than just about anything.
>>
>> I'm using a Unix client to handle all of the heavy lifting of 
>> uploading and downloading, as Glacier is meant to be used via an API 
>> rather than a web client.[1] If anyone is interested, I have local 
>> documentation on usage that I could probably genericize. And yes, I 
>> did round-trip a file to make sure it functioned as advertised.
>> Roy
>>
>> [1] https://github.com/vsespb/mt-aws-glacier
>>
>> On Thu, Jan 10, 2013 at 2:29 PM,  <[log in to unmask]>
>>wrote:
>>> We built our own solution for this by creating a plugin that works 
>>>with our digital asset management system (ResourceSpace) to 
>>>invidually back up files to Amazon S3. Because S3 is replicated to 
>>>multiple data centers, this provides a fairly high level of 
>>>redundancy. And because it's an object-based web service, we can 
>>>access any given object individually by using a URL related to the 
>>>original storage URL within our system.
>>>
>>> This also allows us to take advantage of S3 for images on our website.
>>>All of the images from in our online collections database are being 
>>>served straight from S3, which diverts the load from our public web 
>>>server. When we launch zoomable images later this year, all of the 
>>>tiles will also be generated locally in the DAM and then served to 
>>>the public via the mirrored copy in S3.
>>>
>>> The current pricing is around $0.08/GB/month for 1-50 TB, which I 
>>>think is fairly reasonable for what we're getting. They just dropped 
>>>the price substantially a few months ago.
>>>
>>> DuraCloud http://www.duracloud.org/ supposedly offers a way to add 
>>>another abstraction layer so you can build something like this that 
>>>is portable between different cloud storage providers. But I haven't 
>>>really looked into this as of yet.
>
>
>--
>Gary McGath, Professional Software Developer http://www.garymcgath.com


-----------------------------------------
**************************************************************************************************
The contents of this email and any attachments are confidential.
They are intended for the named recipient(s) only.
If you have received this email in error please notify the system manager or  the sender immediately and do not disclose the contents to anyone or make copies.

** IronMail scanned this email for viruses, vandals and malicious content. **
**************************************************************************************************