My concern would be more that given proven weaknesses in MD5, do I want to risk that 1 in a billion chance that the “right” bit error creeps into an archive that manages to not impact the checksum, thus creating the illusion that the archive integrity has not been violated? -- Andrew Anderson, Director of Development, Library and Information Resources Network, Inc. http://www.lirn.net/ | http://www.twitter.com/LIRNnotes | http://www.facebook.com/LIRNnotes On Oct 2, 2014, at 18:34, Jonathan Rochkind <[log in to unmask]> wrote: > For checksums for ensuring archival integrity, are cryptographic flaws relavent? I'm not sure, is part of the point of a checksum to ensure against _malicious_ changes to files? I honestly don't know. (But in most systems, I'd guess anyone who had access to maliciously change the file would also have access to maliciously change the checksum!) > > Rot13 is not suitable as a checksum for ensuring archival integrity however, because it's output is no smaller than it's input, which is kind of what you're looking for. > > ________________________________________ > From: Code for Libraries [[log in to unmask]] on behalf of Cary Gordon [[log in to unmask]] > Sent: Thursday, October 02, 2014 5:51 PM > To: [log in to unmask] > Subject: Re: [CODE4LIB] What is the real impact of SHA-256? - Updated > > +1 > > MD5 is little better than ROT13. At least with ROT13, you have no illusions. > > We use SHA 512 for most work. We don't do finance or national security, so it is a good fit for us. > > Cary > > On Oct 2, 2014, at 12:30 PM, Simon Spero <[log in to unmask]> wrote: > >> Intel skylake processors have dedicated sha instructions. >> See: https://software.intel.com/en-us/articles/intel-sha-extensions >> >> Using a tree hash approach (which is inherently embarrassingly parallel) >> will leave io time dominant. This approach is used by Amazon glacier - see >> http://docs.aws.amazon.com/amazonglacier/latest/dev/checksum-calculations.html >> >> MD5 is broken, and cannot be used for any security purposes. It cannot be >> used for deduplication if any of the files are in the directories of >> security researchers! >> >> If security is not a concern then there are many faster hashing algorithms >> that avoid the costs imposed by the need to defend against adversaries. >> See siphash, murmur, cityhash, etc. >> >> Simon >> On Oct 2, 2014 11:18 AM, "Alex Duryee" <[log in to unmask]> wrote: >> >>> Despite some of its relative flaws, MD5 is frequently selected over SHA-256 >>> in archives as the checksum algorithm of choice. One of the primary factors >>> here is the longer processing time required for SHA-256, though there have >>> been no empirical studies calculating that time difference and its overall >>> impact on checksum generation and verification in a preservation >>> environment. >>> >>> AVPreserve Consultant Alex Duryee recently ran a series of tests comparing >>> the real time and cpu time used by each algorithm. His newly updated white >>> paper "What Is the Real Impact of SHA-256?" presents the results and comes >>> to some interesting conclusions regarding the actual time difference >>> between the two and what other factors may have a greater impact on your >>> selection decision and file monitoring workflow. The paper can be >>> downloaded for free at >>> >>> http://www.avpreserve.com/papers-and-presentations/whats-the-real-impact-of-sha-256/ >>> . >>> ______________________________________ >>> >>> Alex Duryee >>> *AVPreserve* >>> 350 7th Ave., Suite 1605 >>> New York, NY 10001 >>> >>> office: 917-475-9630 >>> >>> http://www.avpreserve.com >>> Facebook.com/AVPreserve <http://facebook.com/AVPreserve> >>> twitter.com/AVPreserve >>>