LISTSERV 16.5 - CODE4LIB Archives

There are published papers on MD5 collisions, with associated examples.

Researchers at http://isi.jhu.edu are quite likely to have read and
downloaded them.
E. G.
http://www.forensicfocus.com/Content/pid=87/page=2/
On Oct 3, 2014 3:05 PM, "Alexander Duryee" <[log in to unmask]>
wrote:

> Simon - do you have any examples of MD5 collisions in JHU's collections?
> The chance of that occurring is vanishingly small (
> http://prezi.com/zfyebvaelksh/fixity-20/) so I'm curious what produced the
> collision, and how often.
>
> On Fri, Oct 3, 2014 at 12:14 PM, Kyle Banerjee <[log in to unmask]>
> wrote:
>
> > On Fri, Oct 3, 2014 at 7:26 AM, Charles Blair <[log in to unmask]> wrote:
> >
> > > Look at slide 15 here:
> > > http://www.slideshare.net/DuraSpace/sds-cwebinar-1
> > >
> > > I think we're worried about the cumulative effect over time of
> > > undetected errors (at least, I am).
> >
> >
> > This slide shows that data loss via drive fault is extremely rare. Note
> > that a bit getting flipped is usually harmless. However, I do believe
> that
> > data corruption via other avenues will be considerably more common.
> >
> > My point is that the use case for libraries is generally weak and the
> > solution is very expensive -- don't forget the authenticity checks must
> > also be done on the "good" files. As you start dealing with more and more
> > data, this system is not sustainable for the simple reason that
> maintained
> > disk space costs a fortune and network capacity is a bottleneck. It's no
> > big deal to do this on a few TB since our repositories don't have to
> worry
> > about the integrity of dynamic data, but you eventually get to a point
> > where it sucks up too many systems resources and consumes too much
> > expertise.
> >
> > Authoritative files really should be offline but if online access to
> > authoritative files is seen as an imperative, it at least makes more
> sense
> > to just do something like dump it all in Glacier and slowly refresh
> > everything you own with authoritative copy. Or better yet, just leave the
> > stuff there and just make new derivatives when there is any reason to
> > believe the existing ones are not good.
> >
> > While I think integrity is an issue, I think other deficiencies in
> > repositories are  more pressing. Except for the simplest use cases,
> getting
> > stuff in or out of them is a hopeless process even with automated
> > assistance. Metadata and maintenance aren't very good either. That you
> > still need coding skills to get popular platforms that have been in use
> for
> > many years to ingest and serve up things as simple as documents and
> images
> > speaks volumes.
> >
> > kyle
> >
>