Intel skylake processors have dedicated sha instructions.
See: https://software.intel.com/en-us/articles/intel-sha-extensions
Using a tree hash approach (which is inherently embarrassingly parallel)
will leave io time dominant. This approach is used by Amazon glacier - see
http://docs.aws.amazon.com/amazonglacier/latest/dev/checksum-calculations.html
MD5 is broken, and cannot be used for any security purposes. It cannot be
used for deduplication if any of the files are in the directories of
security researchers!
If security is not a concern then there are many faster hashing algorithms
that avoid the costs imposed by the need to defend against adversaries.
See siphash, murmur, cityhash, etc.
Simon
On Oct 2, 2014 11:18 AM, "Alex Duryee" <[log in to unmask]> wrote:
> Despite some of its relative flaws, MD5 is frequently selected over SHA-256
> in archives as the checksum algorithm of choice. One of the primary factors
> here is the longer processing time required for SHA-256, though there have
> been no empirical studies calculating that time difference and its overall
> impact on checksum generation and verification in a preservation
> environment.
>
> AVPreserve Consultant Alex Duryee recently ran a series of tests comparing
> the real time and cpu time used by each algorithm. His newly updated white
> paper "What Is the Real Impact of SHA-256?" presents the results and comes
> to some interesting conclusions regarding the actual time difference
> between the two and what other factors may have a greater impact on your
> selection decision and file monitoring workflow. The paper can be
> downloaded for free at
>
> http://www.avpreserve.com/papers-and-presentations/whats-the-real-impact-of-sha-256/
> .
> ______________________________________
>
> Alex Duryee
> *AVPreserve*
> 350 7th Ave., Suite 1605
> New York, NY 10001
>
> office: 917-475-9630
>
> http://www.avpreserve.com
> Facebook.com/AVPreserve <http://facebook.com/AVPreserve>
> twitter.com/AVPreserve
>
|