Print

Print


Thanks for the replies! Unfortunately I suspect Kyle's right and I need to
set up a protocol for doing it by hand.

Your project sounds really interesting, Jason, and since the papers were
scanned on a black background, it would actually be very useful if it
weren't for the fact that we deliberately crop to show the edge of the
pages. Unfortunately, that means there would be black space detected around
the edge of each image as well, and the variance in that would throw off
the numbers. We could crop the images again for processing that way, but if
we're going to have to go through each of them by hand as it is, I don't
know if the time savings of trying to do something programmatic are worth
it.

Thanks!

Christine

On Tue, Dec 1, 2015 at 3:53 PM, Jesse Martinez <[log in to unmask]>
wrote:

> Hi John,
>
> That sounds really interesting! Can you share a link to this game or code?
>
> Jesse
>
> On Tue, Dec 1, 2015 at 3:43 PM, Jason Bengtson <[log in to unmask]>
> wrote:
>
> > This may be a dumb thought, but I built a game a couple of years ago
> which
> > tracked results on a map (on an HTML canvas, with the map set as a
> > background with objects drawn on top of it) by counting the pixels of a
> > certain color and comparing them as a percentage against the pixels in
> the
> > whole map. You could do something similar, by comparing black or gray
> > beyond a particular threshold against total pixels. That would be a
> pretty
> > rough and ready approach, but it might be worth a shot. If the missing
> > sections have a significantly different color than the rest of the image,
> > that could be another metric to use.
> >
> > Best regards,
> > *Jason Bengtson, MLIS, MA*
> > Innovation Architect
> >
> >
> > *Houston Academy of MedicineThe Texas Medical Center Library*
> > 1133 John Freeman Blvd
> > Houston, TX   77030
> > http://library.tmc.edu/
> > www.jasonbengtson.com
> >
> > On Tue, Dec 1, 2015 at 2:07 PM, Christine Mayo <[log in to unmask]> wrote:
> >
> > > Hi all,
> > >
> > > I have an interesting assessment issue with some recently digitized
> > > newspapers that I wondered if anyone could shed some light on. We sent
> a
> > > batch of 19th century newspapers off to a vendor knowing they weren't
> in
> > > great shape, and now we have to decide whether the resultant images
> > (TIFFs)
> > > are usable or we should be looking for alternative copies and/or
> > microfilm.
> > >
> > > A lot of the images are in decent shape, but the first few pages of
> each
> > > issue are heavily creased and generally missing a smallish piece from
> the
> > > center of the page where the folds met. I'm looking for a way to
> > > programmatically identify how much text is missing/unusable for each
> > page.
> > > We haven't run OCR yet, part of this assessment is to figure out
> whether
> > we
> > > should bother sending these items out for OCR and METS/ALTO creation,
> > but I
> > > suspect we could run a quick and dirty in-house OCR if that would help.
> > >
> > > We can go through the images by hand and try to measure and/or count,
> but
> > > if anyone's worked on something like this or has thoughts, I'd love to
> > hear
> > > them!
> > >
> > > Thanks,
> > > Christine
> > >
> > > --
> > > Christine Mayo
> > > Digital Production Librarian
> > > Thomas P. O'Neill, Jr. Library
> > > Boston College
> > > 140 Commonwealth Avenue
> > > Chestnut Hill, MA 02467
> > > [log in to unmask]
> > >
> >
>
>
>
> --
> Jesse Martinez
> Web Services Librarian
> O'Neill Library, Boston College
> [log in to unmask]
> 617-552-2509
>



-- 
Christine Mayo
Digital Production Librarian
Thomas P. O'Neill, Jr. Library
Boston College
140 Commonwealth Avenue
Chestnut Hill, MA 02467
[log in to unmask]