LISTSERV 16.5 - CODE4LIB Archives

I’m not sure I understand the prior comment about compression.

I agree that hashing workflows are not simple nor of-themselves secure. I agree with the implication that they can explode in scope.

From what I can tell, the state of hashing verification tools reflects substantial confusion over their utility and purpose. In some ways it’s a quixotic attempt to re-invent LOCKSS or equivalent. In other ways it’s perfectly sensible.

I think that the move to evaluate SHA-256 reflects some clear concern over tampering (as does the history of LOCKSS e.g. Itself). This is not to say that MD5 collisions (much less, substitutions) are mathematically trivial, but rather, that they are now commonly contemplated.

Compare Bruce Schneier’s comments about abandoning SHA-1 entirely, or computation’s reliance on Cyclic Redundancy Checks. In many ways it’s an InfoSec consideration dropped in the middle of archival or library workflow specification.

--
Al Matthews
Software Developer, Digital Services Unit
Atlanta University Center, Robert W. Woodruff Library
email: [log in to unmask]; office: 1 404 978 2057


From: Charles Blair <[log in to unmask]<mailto:[log in to unmask]>>
Organization: The University of Chicago Library
Reply-To: "[log in to unmask]<mailto:[log in to unmask]>" <[log in to unmask]<mailto:[log in to unmask]>>
Date: Friday, October 3, 2014 at 10:26 AM
To: "[log in to unmask]<mailto:[log in to unmask]>" <[log in to unmask]<mailto:[log in to unmask]>>
Subject: Re: [CODE4LIB] What is the real impact of SHA-256? - Updated

Look at slide 15 here:
http://www.slideshare.net/DuraSpace/sds-cwebinar-1

I think we're worried about the cumulative effect over time of
undetected errors (at least, I am).

On Fri, Oct 03, 2014 at 05:37:14AM -0700, Kyle Banerjee wrote:
On Thu, Oct 2, 2014 at 3:47 PM, Simon Spero <[log in to unmask]<mailto:[log in to unmask]>> wrote:

> Checksums can be kept separate (tripwire style).
> For JHU archiving, the use of MD5 would give false positives for duplicate
> detection.
>
> There is no reason to use a bad cryptographic hash. Use a fast hash, or use
> a safe hash.
>

I have always been puzzled why so much energy is expended on bit integrity
in the library and archival communities. Hashing does not accommodate
modification of internal metadata or compression which do not compromise
integrity. And if people who can access the files can also access the
hashes, there is no contribution to security. Also, wholesale hashing of
repositories scales poorly,  My guess is that the biggest threats are staff
error or rogue processes (i.e. bad programming). Any malicious
destruction/modification is likely to be an inside job.

In reality, using file size alone is probably sufficient for detecting
changed files -- if dup detection is desired, then hashing the few that dup
out can be performed. Though if dups are an actual issue, it reflects
problems elsewhere. Thrashing disks and cooking the CPU for the purposes
libraries use hashes for seems way overkill, especially given that basic
interaction with repositories for depositors, maintainers, and users is
still in a very primitive state.

kyle


--
Charles Blair, Director, Digital Library Development Center, University of Chicago Library
1 773 702 8459 | [log in to unmask]<mailto:[log in to unmask]> | http://www.lib.uchicago.edu/~chas/


**************************************************************************************************
The contents of this email and any attachments are confidential.
They are intended for the named recipient(s) only.
If you have received this email in error please notify the system
manager or  the 
sender immediately and do not disclose the contents to anyone or
make copies.

** IronMail scanned this email for viruses, vandals and malicious
content. **
**************************************************************************************************