Print

Print


Kyle, are all your Glacier/S3 assets backed up by a person, or is it
automated as part of an IR software package of some sort?

Joshua Welker
Information Technology Librarian
James C. Kirkpatrick Library
University of Central Missouri
Warrensburg, MO 64093
JCKL 2260
660.543.8022


On Thu, Oct 26, 2017 at 1:48 PM, Kyle Banerjee <[log in to unmask]>
wrote:

> On Thu, Oct 26, 2017 at 7:03 AM, Jonathan Rochkind <[log in to unmask]>
> wrote:
>
> > I think it's actually worth interrogating and getting specific about what
> > we mean by "preservation features".
> >
> > I think they may not actually be all that complicated or hard to add on
> to
> > nearly any solution.  I think an actual 'repository solution' may
> actually
> > not be as complicated as people assume when you actually specify it.
> >
> > The main preservation feature people actually use, "fixity", is just
> taking
> > a checksum of a file (perhaps using SHA1), storing it somewhere, and then
> > later checking to make sure the file still has the same checksum, and
> > alerting if it does not. This is a relatively simple feature to add to
> any
> > software.
> >
>
> It's also worth considering the function of the checksum. I believe the
> argument they mitigate rot on a modern filesystem is weak.
>
> The normal way checksums are implemented presumes the following are immune
> to bit rot:
>
>    - the OS
>    - every dependency for the program code
>    - the interpreter
>    - the checksum itself
>
> Another way of putting it is that the assumption is that bit rot only
> affects very specific types of assets. Fortunately, modern filesystems
> detect and repair errors which is why they can be trusted for important
> things.
>
> People and rogue processes may intentionally or unintentionally mess things
> up. Checksums are potentially useful here, but there are multiple
> mechanisms that can be used for that purpose. Note that checksums are
> useless against intentional modification if those who modify assets also
> have permissions to modify checksums.
>
> We have a simple approach. We've been moving everything to Amazon Glacier
> for cold storage using a specially configured S3 bucket that allows us to
> use DOIs as keys to retrieve things.  IAM policies are set up to prevent
> undesirable activity, support versioning, etc.
>
> We're also slowly moving towards using S3 for hot storage assets with
> modest IO requirements. This process has been a bit slow because there are
> issues with organizational policy and people wrapping their mind around how
> S3 and Glacier work -- even a lot of tech people here seem to think of them
> in terms of the disks of yore mounted in a rack somewhere else with
> ordinary files stored and transmitted in cleartext.
>
> Bottom line is that depending on what Josh needs to do, there may well be
> options that are far easier, cheaper, and more reliable than what he has or
> could possibly achieve in an  all-in-one solution labeled as a
> "repository." No need to use a chain saw to cut butter.
>
> kyle
>