There are published papers on MD5 collisions, with associated examples. Researchers at http://isi.jhu.edu are quite likely to have read and downloaded them. E. G. http://www.forensicfocus.com/Content/pid=87/page=2/ On Oct 3, 2014 3:05 PM, "Alexander Duryee" <[log in to unmask]> wrote: > Simon - do you have any examples of MD5 collisions in JHU's collections? > The chance of that occurring is vanishingly small ( > http://prezi.com/zfyebvaelksh/fixity-20/) so I'm curious what produced the > collision, and how often. > > On Fri, Oct 3, 2014 at 12:14 PM, Kyle Banerjee <[log in to unmask]> > wrote: > > > On Fri, Oct 3, 2014 at 7:26 AM, Charles Blair <[log in to unmask]> wrote: > > > > > Look at slide 15 here: > > > http://www.slideshare.net/DuraSpace/sds-cwebinar-1 > > > > > > I think we're worried about the cumulative effect over time of > > > undetected errors (at least, I am). > > > > > > This slide shows that data loss via drive fault is extremely rare. Note > > that a bit getting flipped is usually harmless. However, I do believe > that > > data corruption via other avenues will be considerably more common. > > > > My point is that the use case for libraries is generally weak and the > > solution is very expensive -- don't forget the authenticity checks must > > also be done on the "good" files. As you start dealing with more and more > > data, this system is not sustainable for the simple reason that > maintained > > disk space costs a fortune and network capacity is a bottleneck. It's no > > big deal to do this on a few TB since our repositories don't have to > worry > > about the integrity of dynamic data, but you eventually get to a point > > where it sucks up too many systems resources and consumes too much > > expertise. > > > > Authoritative files really should be offline but if online access to > > authoritative files is seen as an imperative, it at least makes more > sense > > to just do something like dump it all in Glacier and slowly refresh > > everything you own with authoritative copy. Or better yet, just leave the > > stuff there and just make new derivatives when there is any reason to > > believe the existing ones are not good. > > > > While I think integrity is an issue, I think other deficiencies in > > repositories are more pressing. Except for the simplest use cases, > getting > > stuff in or out of them is a hopeless process even with automated > > assistance. Metadata and maintenance aren't very good either. That you > > still need coding skills to get popular platforms that have been in use > for > > many years to ingest and serve up things as simple as documents and > images > > speaks volumes. > > > > kyle > > >