Hello Code4Libbers, I'm working with a faculty member and trying to help them to formalize their data collection practices. Part of this process is also going through old data and trying to assess what they currently have. This particular faculty member has been doing research for 10 years without any kind of structure or regular method. So far we have over 2 TB of data in various states. (With more to come.) I've got a programmer working with me to: a) identify file types b) count how many files of each type We are now working on de-duping and assessing file size, focusing on the JPEGs first. With over 300,000 over them...it might take a while. (Of course they aren't following any kind of file naming structure, either...It's a mess.) Any tips or tricks or tools that you might know of to help speed up this process? Is there a good image recognition tool that you could suggest that would help us with automation? Thanks, Carmen Mitchell Institutional Repository Librarian Cal State San Marcos