Checksums can be kept separate (tripwire style). For JHU archiving, the use of MD5 would give false positives for duplicate detection. There is no reason to use a bad cryptographic hash. Use a fast hash, or use a safe hash. Simon On Oct 2, 2014 6:34 PM, "Jonathan Rochkind" <[log in to unmask]> wrote: > For checksums for ensuring archival integrity, are cryptographic flaws > relavent? I'm not sure, is part of the point of a checksum to ensure > against _malicious_ changes to files? I honestly don't know. (But in most > systems, I'd guess anyone who had access to maliciously change the file > would also have access to maliciously change the checksum!) > > Rot13 is not suitable as a checksum for ensuring archival integrity > however, because it's output is no smaller than it's input, which is kind > of what you're looking for. > > ________________________________________ > From: Code for Libraries [[log in to unmask]] on behalf of Cary > Gordon [[log in to unmask]] > Sent: Thursday, October 02, 2014 5:51 PM > To: [log in to unmask] > Subject: Re: [CODE4LIB] What is the real impact of SHA-256? - Updated > > +1 > > MD5 is little better than ROT13. At least with ROT13, you have no > illusions. > > We use SHA 512 for most work. We don't do finance or national security, so > it is a good fit for us. > > Cary > > On Oct 2, 2014, at 12:30 PM, Simon Spero <[log in to unmask]> wrote: > > > Intel skylake processors have dedicated sha instructions. > > See: https://software.intel.com/en-us/articles/intel-sha-extensions > > > > Using a tree hash approach (which is inherently embarrassingly parallel) > > will leave io time dominant. This approach is used by Amazon glacier - > see > > > http://docs.aws.amazon.com/amazonglacier/latest/dev/checksum-calculations.html > > > > MD5 is broken, and cannot be used for any security purposes. It cannot be > > used for deduplication if any of the files are in the directories of > > security researchers! > > > > If security is not a concern then there are many faster hashing > algorithms > > that avoid the costs imposed by the need to defend against adversaries. > > See siphash, murmur, cityhash, etc. > > > > Simon > > On Oct 2, 2014 11:18 AM, "Alex Duryee" <[log in to unmask]> wrote: > > > >> Despite some of its relative flaws, MD5 is frequently selected over > SHA-256 > >> in archives as the checksum algorithm of choice. One of the primary > factors > >> here is the longer processing time required for SHA-256, though there > have > >> been no empirical studies calculating that time difference and its > overall > >> impact on checksum generation and verification in a preservation > >> environment. > >> > >> AVPreserve Consultant Alex Duryee recently ran a series of tests > comparing > >> the real time and cpu time used by each algorithm. His newly updated > white > >> paper "What Is the Real Impact of SHA-256?" presents the results and > comes > >> to some interesting conclusions regarding the actual time difference > >> between the two and what other factors may have a greater impact on your > >> selection decision and file monitoring workflow. The paper can be > >> downloaded for free at > >> > >> > http://www.avpreserve.com/papers-and-presentations/whats-the-real-impact-of-sha-256/ > >> . > >> ______________________________________ > >> > >> Alex Duryee > >> *AVPreserve* > >> 350 7th Ave., Suite 1605 > >> New York, NY 10001 > >> > >> office: 917-475-9630 > >> > >> http://www.avpreserve.com > >> Facebook.com/AVPreserve <http://facebook.com/AVPreserve> > >> twitter.com/AVPreserve > >> >