My concern would be more that given proven weaknesses in MD5, do I want to risk that 1 in a billion chance that the “right” bit error creeps into an archive that manages to not impact the checksum, thus creating the illusion that the archive integrity has not been violated?
--
Andrew Anderson, Director of Development, Library and Information Resources Network, Inc.
http://www.lirn.net/ | http://www.twitter.com/LIRNnotes | http://www.facebook.com/LIRNnotes
On Oct 2, 2014, at 18:34, Jonathan Rochkind <[log in to unmask]> wrote:
> For checksums for ensuring archival integrity, are cryptographic flaws relavent? I'm not sure, is part of the point of a checksum to ensure against _malicious_ changes to files? I honestly don't know. (But in most systems, I'd guess anyone who had access to maliciously change the file would also have access to maliciously change the checksum!)
>
> Rot13 is not suitable as a checksum for ensuring archival integrity however, because it's output is no smaller than it's input, which is kind of what you're looking for.
>
> ________________________________________
> From: Code for Libraries [[log in to unmask]] on behalf of Cary Gordon [[log in to unmask]]
> Sent: Thursday, October 02, 2014 5:51 PM
> To: [log in to unmask]
> Subject: Re: [CODE4LIB] What is the real impact of SHA-256? - Updated
>
> +1
>
> MD5 is little better than ROT13. At least with ROT13, you have no illusions.
>
> We use SHA 512 for most work. We don't do finance or national security, so it is a good fit for us.
>
> Cary
>
> On Oct 2, 2014, at 12:30 PM, Simon Spero <[log in to unmask]> wrote:
>
>> Intel skylake processors have dedicated sha instructions.
>> See: https://software.intel.com/en-us/articles/intel-sha-extensions
>>
>> Using a tree hash approach (which is inherently embarrassingly parallel)
>> will leave io time dominant. This approach is used by Amazon glacier - see
>> http://docs.aws.amazon.com/amazonglacier/latest/dev/checksum-calculations.html
>>
>> MD5 is broken, and cannot be used for any security purposes. It cannot be
>> used for deduplication if any of the files are in the directories of
>> security researchers!
>>
>> If security is not a concern then there are many faster hashing algorithms
>> that avoid the costs imposed by the need to defend against adversaries.
>> See siphash, murmur, cityhash, etc.
>>
>> Simon
>> On Oct 2, 2014 11:18 AM, "Alex Duryee" <[log in to unmask]> wrote:
>>
>>> Despite some of its relative flaws, MD5 is frequently selected over SHA-256
>>> in archives as the checksum algorithm of choice. One of the primary factors
>>> here is the longer processing time required for SHA-256, though there have
>>> been no empirical studies calculating that time difference and its overall
>>> impact on checksum generation and verification in a preservation
>>> environment.
>>>
>>> AVPreserve Consultant Alex Duryee recently ran a series of tests comparing
>>> the real time and cpu time used by each algorithm. His newly updated white
>>> paper "What Is the Real Impact of SHA-256?" presents the results and comes
>>> to some interesting conclusions regarding the actual time difference
>>> between the two and what other factors may have a greater impact on your
>>> selection decision and file monitoring workflow. The paper can be
>>> downloaded for free at
>>>
>>> http://www.avpreserve.com/papers-and-presentations/whats-the-real-impact-of-sha-256/
>>> .
>>> ______________________________________
>>>
>>> Alex Duryee
>>> *AVPreserve*
>>> 350 7th Ave., Suite 1605
>>> New York, NY 10001
>>>
>>> office: 917-475-9630
>>>
>>> http://www.avpreserve.com
>>> Facebook.com/AVPreserve <http://facebook.com/AVPreserve>
>>> twitter.com/AVPreserve
>>>
|