Print

Print


This may be a dumb thought, but I built a game a couple of years ago which
tracked results on a map (on an HTML canvas, with the map set as a
background with objects drawn on top of it) by counting the pixels of a
certain color and comparing them as a percentage against the pixels in the
whole map. You could do something similar, by comparing black or gray
beyond a particular threshold against total pixels. That would be a pretty
rough and ready approach, but it might be worth a shot. If the missing
sections have a significantly different color than the rest of the image,
that could be another metric to use.

Best regards,
*Jason Bengtson, MLIS, MA*
Innovation Architect


*Houston Academy of MedicineThe Texas Medical Center Library*
1133 John Freeman Blvd
Houston, TX   77030
http://library.tmc.edu/
www.jasonbengtson.com

On Tue, Dec 1, 2015 at 2:07 PM, Christine Mayo <[log in to unmask]> wrote:

> Hi all,
>
> I have an interesting assessment issue with some recently digitized
> newspapers that I wondered if anyone could shed some light on. We sent a
> batch of 19th century newspapers off to a vendor knowing they weren't in
> great shape, and now we have to decide whether the resultant images (TIFFs)
> are usable or we should be looking for alternative copies and/or microfilm.
>
> A lot of the images are in decent shape, but the first few pages of each
> issue are heavily creased and generally missing a smallish piece from the
> center of the page where the folds met. I'm looking for a way to
> programmatically identify how much text is missing/unusable for each page.
> We haven't run OCR yet, part of this assessment is to figure out whether we
> should bother sending these items out for OCR and METS/ALTO creation, but I
> suspect we could run a quick and dirty in-house OCR if that would help.
>
> We can go through the images by hand and try to measure and/or count, but
> if anyone's worked on something like this or has thoughts, I'd love to hear
> them!
>
> Thanks,
> Christine
>
> --
> Christine Mayo
> Digital Production Librarian
> Thomas P. O'Neill, Jr. Library
> Boston College
> 140 Commonwealth Avenue
> Chestnut Hill, MA 02467
> [log in to unmask]
>