LISTSERV 16.5 - CODE4LIB Archives

On Tue, Mar 19, 2013 at 1:51 PM, Carmen Mitchell
<[log in to unmask]>wrote:

> We are now working on de-duping and assessing file size, focusing on the
> JPEGs first. With over 300,000 over them...it might take a while. (Of
> course they aren't following any kind of file naming structure,
> either...It's a mess.)
>

300K files in 10 years? That's more than 80 files per day, 7 days a week,
365 days per year. What the heck is this stuff? The method of organizing is
going to depend on what it is since no one is going to be able to actually
look at these things.

Locating outright dups is totally braindead but you may have to deal with
dups that have been resized or altered in some other way. At least for the
images, exiftool can be handy for that purpose because whatever created the
photos will have added all kinds of metadata that can be analyzed. Exiftool
is also really handy for prioritizing processing, and assigning metadata.

kyle