Checksums can be kept separate (tripwire style).
For JHU archiving, the use of MD5 would give false positives for duplicate
detection.
There is no reason to use a bad cryptographic hash. Use a fast hash, or use
a safe hash.
Simon
On Oct 2, 2014 6:34 PM, "Jonathan Rochkind" <[log in to unmask]> wrote:
> For checksums for ensuring archival integrity, are cryptographic flaws
> relavent? I'm not sure, is part of the point of a checksum to ensure
> against _malicious_ changes to files? I honestly don't know. (But in most
> systems, I'd guess anyone who had access to maliciously change the file
> would also have access to maliciously change the checksum!)
>
> Rot13 is not suitable as a checksum for ensuring archival integrity
> however, because it's output is no smaller than it's input, which is kind
> of what you're looking for.
>
> ________________________________________
> From: Code for Libraries [[log in to unmask]] on behalf of Cary
> Gordon [[log in to unmask]]
> Sent: Thursday, October 02, 2014 5:51 PM
> To: [log in to unmask]
> Subject: Re: [CODE4LIB] What is the real impact of SHA-256? - Updated
>
> +1
>
> MD5 is little better than ROT13. At least with ROT13, you have no
> illusions.
>
> We use SHA 512 for most work. We don't do finance or national security, so
> it is a good fit for us.
>
> Cary
>
> On Oct 2, 2014, at 12:30 PM, Simon Spero <[log in to unmask]> wrote:
>
> > Intel skylake processors have dedicated sha instructions.
> > See: https://software.intel.com/en-us/articles/intel-sha-extensions
> >
> > Using a tree hash approach (which is inherently embarrassingly parallel)
> > will leave io time dominant. This approach is used by Amazon glacier -
> see
> >
> http://docs.aws.amazon.com/amazonglacier/latest/dev/checksum-calculations.html
> >
> > MD5 is broken, and cannot be used for any security purposes. It cannot be
> > used for deduplication if any of the files are in the directories of
> > security researchers!
> >
> > If security is not a concern then there are many faster hashing
> algorithms
> > that avoid the costs imposed by the need to defend against adversaries.
> > See siphash, murmur, cityhash, etc.
> >
> > Simon
> > On Oct 2, 2014 11:18 AM, "Alex Duryee" <[log in to unmask]> wrote:
> >
> >> Despite some of its relative flaws, MD5 is frequently selected over
> SHA-256
> >> in archives as the checksum algorithm of choice. One of the primary
> factors
> >> here is the longer processing time required for SHA-256, though there
> have
> >> been no empirical studies calculating that time difference and its
> overall
> >> impact on checksum generation and verification in a preservation
> >> environment.
> >>
> >> AVPreserve Consultant Alex Duryee recently ran a series of tests
> comparing
> >> the real time and cpu time used by each algorithm. His newly updated
> white
> >> paper "What Is the Real Impact of SHA-256?" presents the results and
> comes
> >> to some interesting conclusions regarding the actual time difference
> >> between the two and what other factors may have a greater impact on your
> >> selection decision and file monitoring workflow. The paper can be
> >> downloaded for free at
> >>
> >>
> http://www.avpreserve.com/papers-and-presentations/whats-the-real-impact-of-sha-256/
> >> .
> >> ______________________________________
> >>
> >> Alex Duryee
> >> *AVPreserve*
> >> 350 7th Ave., Suite 1605
> >> New York, NY 10001
> >>
> >> office: 917-475-9630
> >>
> >> http://www.avpreserve.com
> >> Facebook.com/AVPreserve <http://facebook.com/AVPreserve>
> >> twitter.com/AVPreserve
> >>
>
|