Heh, well she works with teams of students and willing volunteers from
native communities. The faculty member in question has been
doing documentation and revitalization of endangered languages and has
worked on language revitalization efforts with several communities,
including the Oklahoma Kickapoo, the Jicarilla Apache, the Q'anjob'al Maya
community in San Diego, and the Ixhil Maya community in Nebaj, El Quiché,
Guatemala. She wants to make all her research and data public - and has
permission to do so - but there needs to be some assessment and structure
to this. It's a cool (if daunting) project.
We're starting with the JPEGs. I still have the audio and video files to
think about. :-(
On Tue, Mar 19, 2013 at 3:08 PM, Kyle Banerjee <[log in to unmask]>wrote:
> On Tue, Mar 19, 2013 at 1:51 PM, Carmen Mitchell
> <[log in to unmask]>wrote:
>
> > We are now working on de-duping and assessing file size, focusing on the
> > JPEGs first. With over 300,000 over them...it might take a while. (Of
> > course they aren't following any kind of file naming structure,
> > either...It's a mess.)
> >
>
> 300K files in 10 years? That's more than 80 files per day, 7 days a week,
> 365 days per year. What the heck is this stuff? The method of organizing is
> going to depend on what it is since no one is going to be able to actually
> look at these things.
>
> Locating outright dups is totally braindead but you may have to deal with
> dups that have been resized or altered in some other way. At least for the
> images, exiftool can be handy for that purpose because whatever created the
> photos will have added all kinds of metadata that can be analyzed. Exiftool
> is also really handy for prioritizing processing, and assigning metadata.
>
> kyle
>
|