Carmen,
If you are only interested in de-duping and assessing file size, it may
be overkill. Picasa has some good organizing and browsing features.
Your developer may want to look at the Picasa (Desktop Client) Button
API, which can kick off scripts for processing selected photos:
https://developers.google.com/picasa/docs/button_api
-Shaun
On 3/19/13 4:51 PM, Carmen Mitchell wrote:
> Hello Code4Libbers,
>
> I'm working with a faculty member and trying to help them to formalize
> their data collection practices. Part of this process is also going through
> old data and trying to assess what they currently have. This particular
> faculty member has been doing research for 10 years without any kind of
> structure or regular method. So far we have over 2 TB of data in various
> states. (With more to come.)
>
> I've got a programmer working with me to:
> a) identify file types
> b) count how many files of each type
>
> We are now working on de-duping and assessing file size, focusing on the
> JPEGs first. With over 300,000 over them...it might take a while. (Of
> course they aren't following any kind of file naming structure,
> either...It's a mess.)
>
> Any tips or tricks or tools that you might know of to help speed up this
> process? Is there a good image recognition tool that you could suggest that
> would help us with automation?
>
> Thanks,
>
> Carmen Mitchell
> Institutional Repository Librarian
> Cal State San Marcos
|