You may want to check out http://www.watchocr.com/ Mark ----- "Andy Kelly" <[log in to unmask]> wrote: > I'm working on scanning some documents in a collection and then > preforming > OCR on the documents. Thus far, I've used Adobe Acrobat Pro's OCR > function > with some success but the machines I'm working on are fairly old > Pentium 4 > Dell boxes, this makes opening 600 DPI scans painful and preforming > OCR an > entirely valid excuse for a long coffee break. > > As you might expect, I'm looking for a way to speed up this process at > the > OCR end of things, since the scanning can only move so quickly. I'm > wondering if any of you have experience with any open OCR solutions > such as: > Tesseract-OCR <http://code.google.com/p/tesseract-ocr/> or > ocropus<http://code.google.com/p/ocropus/>. > At a glance, Tesseract seems to be further along in development. Any > other > suggestions on how best to approach this sort of task would be > appreciated > if you've done similar work. > > I've got my own Ubuntu Server I'm planning on evaluating one or both > of > these on, as much for my own interest as the project's or the > organization's. Since I'm an unpaid part-time intern and the only one > who's > working on this project, I'm willing to learn to do things the hard > way so > they're easier in the long run. > > Thanks for any suggestions or advice you may be able to offer. > > -- > ~Andrew M. Kelly > MLIS Degree Candidate, Simmons GSLIS 2011 > Archives & Librarianship Intern, Boston University: African > Presidential > Archive & Research Center > Evening Library Assistant, Bay State College > twitter: @a_m_kelly