Thanks, Shaun and Terry. I'll pass this info along. Terry, I may have Tyson contact you directly if he has questions. I look forward to seeing your lightning talk! Carmen On Tue, Mar 19, 2013 at 2:09 PM, Shaun Ellis <[log in to unmask]> wrote: > Carmen, > If you are only interested in de-duping and assessing file size, it may be > overkill. Picasa has some good organizing and browsing features. Your > developer may want to look at the Picasa (Desktop Client) Button API, which > can kick off scripts for processing selected photos: > https://developers.google.com/**picasa/docs/button_api<https://developers.google.com/picasa/docs/button_api> > > -Shaun > > > On 3/19/13 4:51 PM, Carmen Mitchell wrote: > >> Hello Code4Libbers, >> >> I'm working with a faculty member and trying to help them to formalize >> their data collection practices. Part of this process is also going >> through >> old data and trying to assess what they currently have. This particular >> faculty member has been doing research for 10 years without any kind of >> structure or regular method. So far we have over 2 TB of data in various >> states. (With more to come.) >> >> I've got a programmer working with me to: >> a) identify file types >> b) count how many files of each type >> >> We are now working on de-duping and assessing file size, focusing on the >> JPEGs first. With over 300,000 over them...it might take a while. (Of >> course they aren't following any kind of file naming structure, >> either...It's a mess.) >> >> Any tips or tricks or tools that you might know of to help speed up this >> process? Is there a good image recognition tool that you could suggest >> that >> would help us with automation? >> >> Thanks, >> >> Carmen Mitchell >> Institutional Repository Librarian >> Cal State San Marcos >> >