Print

Print


Thanks for all of these responses, I'm looking forward to investigating
further over the weekend. I'll let you know how it goes.




~Andrew M. Kelly
MLIS DegreeCandidate, Simmons GSLIS 2011
Archives & Librarianship Intern, Boston University: African Presidential
Archive & Research Center
Evening Library Assistant, Bay State College
twitter: @a_m_kelly


On Wed, Jul 28, 2010 at 8:57 PM, Jason Ronallo <[log in to unmask]> wrote:

> I don't know if this would help, but you may want to look at this
> script which wraps cuneiform and other utilities to OCR a PDF. Since
> you're not starting with a PDF you could modify it or write something
> similar in your scripting language of choice.
>
> http://github.com/gkovacs/pdfocr
> https://launchpad.net/~gezakovacs/+archive/pdfocr
>
> Jason
>
> On Wed, Jul 28, 2010 at 11:46 AM, Andy Kelly <[log in to unmask]> wrote:
> > I'm working on scanning some documents in a collection and then
> preforming
> > OCR on the documents. Thus far, I've used Adobe Acrobat Pro's OCR
> function
> > with some success but the machines I'm working on are fairly old Pentium
> 4
> > Dell boxes, this makes opening 600 DPI scans painful and preforming OCR
> an
> > entirely valid excuse for a long coffee break.
> >
> > As you might expect, I'm looking for a way to speed up this process at
> the
> > OCR end of things, since the scanning can only move so quickly. I'm
> > wondering if any of you have experience with any open OCR solutions such
> as:
> > Tesseract-OCR <http://code.google.com/p/tesseract-ocr/> or
> > ocropus<http://code.google.com/p/ocropus/>.
> > At a glance, Tesseract seems to be further along in development. Any
> other
> > suggestions on how best to approach this sort of task would be
> appreciated
> > if you've done similar work.
> >
> > I've got my own Ubuntu Server I'm planning on evaluating one or both of
> > these on, as much for my own interest as the project's or the
> > organization's. Since I'm an unpaid part-time intern and the only one
> who's
> > working on this project, I'm willing to learn to do things the hard way
> so
> > they're easier in the long run.
> >
> > Thanks for any suggestions or advice you may be able to offer.
> >
> > --
> > ~Andrew M. Kelly
> > MLIS Degree Candidate, Simmons GSLIS 2011
> > Archives & Librarianship Intern, Boston University: African Presidential
> > Archive & Research Center
> > Evening Library Assistant, Bay State College
> > twitter: @a_m_kelly
> >
>