LISTSERV 16.5 - CODE4LIB Archives

We use LOCKSS as part of MetaArchive. LOCKSS as I understand it is
typically spec-d for consumer hardware, and so, presumably as a result of
SE Asia flooding, there have been some drive failures and cache downtimes
and adjustments accordingly.

However, that is the worst of it, first.

LOCKSS is to some perhaps even considerable degree, tamper-resistant since
it relies on mechanisms of collective polling among multiple copies to
preserve integrity. This, as opposed to static checksums or some other
solution.

As such, it seems to me important to run a LOCKSS box with other LOCKSS
boxes; MA cooperative specifies six or so, distributed locations for each
cache.

The economic sustainability of such an enterprise is a valid question.
David S H Rosenthal at Stanford seems to lead the charge for this research.

e.g. http://blog.dshr.org/2012/08/amazons-announcement-of-glacier.html#more

I've heard mention from other players that they watch MA carefully for
such sustainability considerations, especially because MA uses LOCKSS for
non-journal content. In some sense this may extend LOCKSS beyond its
original design.

MetaArchive has in my opinion been extremely responsible in designating
succession scenarios and disaster recovery scenarios, going to far as to
fund, develop and test services for migration out of the system, into an
IRODS repository in the initial case.


Al Matthews
AUC Robert W. Woodruff Library

On 1/11/13 9:10 AM, "Joshua Welker" <[log in to unmask]> wrote:

>Good point. But since campus IT will be creating regular
>disaster-recovery backups, the odds that we'd need ever need to retrieve
>more than a handful of files from Glacier at a time is pretty low.
>
>Josh Welker
>
>
>-----Original Message-----
>From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
>Gary McGath
>Sent: Friday, January 11, 2013 8:03 AM
>To: [log in to unmask]
>Subject: Re: [CODE4LIB] Digital collection backups
>
>Concerns have been raised about how expensive Glacier gets if you need to
>recover a lot of files in a short time period.
>
>http://www.wired.com/wiredenterprise/2012/08/glacier/
>
>On 1/10/13 5:56 PM, Roy Tennant wrote:
>> I'd also take a look at Amazon Glacier. Recently I parked about 50GB
>> of data files in logical tar'd and gzip'd chunks and it's costing my
>> employer less than 50 cents/month. Glacier, however, is best for "park
>> it and forget" kinds of needs, as the real cost is in data flow.
>> Storage is cheap, but must be considered "offline" or "near line" as
>> you must first request to retrieve a file, wait for about a day, and
>> then retrieve the file. And you're charged more for the download
>> throughput than just about anything.
>>
>> I'm using a Unix client to handle all of the heavy lifting of
>> uploading and downloading, as Glacier is meant to be used via an API
>> rather than a web client.[1] If anyone is interested, I have local
>> documentation on usage that I could probably genericize. And yes, I
>> did round-trip a file to make sure it functioned as advertised.
>> Roy
>>
>> [1] https://github.com/vsespb/mt-aws-glacier
>>
>> On Thu, Jan 10, 2013 at 2:29 PM,  <[log in to unmask]>
>>wrote:
>>> We built our own solution for this by creating a plugin that works
>>>with our digital asset management system (ResourceSpace) to invidually
>>>back up files to Amazon S3. Because S3 is replicated to multiple data
>>>centers, this provides a fairly high level of redundancy. And because
>>>it's an object-based web service, we can access any given object
>>>individually by using a URL related to the original storage URL within
>>>our system.
>>>
>>> This also allows us to take advantage of S3 for images on our website.
>>>All of the images from in our online collections database are being
>>>served straight from S3, which diverts the load from our public web
>>>server. When we launch zoomable images later this year, all of the
>>>tiles will also be generated locally in the DAM and then served to the
>>>public via the mirrored copy in S3.
>>>
>>> The current pricing is around $0.08/GB/month for 1-50 TB, which I
>>>think is fairly reasonable for what we're getting. They just dropped
>>>the price substantially a few months ago.
>>>
>>> DuraCloud http://www.duracloud.org/ supposedly offers a way to add
>>>another abstraction layer so you can build something like this that is
>>>portable between different cloud storage providers. But I haven't
>>>really looked into this as of yet.
>
>
>--
>Gary McGath, Professional Software Developer http://www.garymcgath.com


-----------------------------------------
**************************************************************************************************
The contents of this email and any attachments are confidential.
They are intended for the named recipient(s) only.
If you have received this email in error please notify the system
manager or  the 
sender immediately and do not disclose the contents to anyone or
make copies.

** IronMail scanned this email for viruses, vandals and malicious
content. **
**************************************************************************************************