Ocropus actually uses Tesseract as its OCR engine (with the idea that eventually you'll be able to plug other engines in), and adds the layout analysis component to it. I've been using it to OCR old manual typewriter pages and I've found it surprisingly good for that purpose. It uses the hOCR standard for its output, which takes a little getting used to (it's HTML with lots of positional markup), but it's easy to convert to XML for further processing. I use scripts that use ImageMagick to generate smaller images (300dpi, grayscale) to feed into Ocropus. Peter -----Original Message----- From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Andy Kelly Sent: Wednesday, July 28, 2010 9:47 AM To: [log in to unmask] Subject: [CODE4LIB] Free/Open OCR solutions? I'm working on scanning some documents in a collection and then preforming OCR on the documents. Thus far, I've used Adobe Acrobat Pro's OCR function with some success but the machines I'm working on are fairly old Pentium 4 Dell boxes, this makes opening 600 DPI scans painful and preforming OCR an entirely valid excuse for a long coffee break. As you might expect, I'm looking for a way to speed up this process at the OCR end of things, since the scanning can only move so quickly. I'm wondering if any of you have experience with any open OCR solutions such as: Tesseract-OCR <http://code.google.com/p/tesseract-ocr/> or ocropus<http://code.google.com/p/ocropus/>. At a glance, Tesseract seems to be further along in development. Any other suggestions on how best to approach this sort of task would be appreciated if you've done similar work. I've got my own Ubuntu Server I'm planning on evaluating one or both of these on, as much for my own interest as the project's or the organization's. Since I'm an unpaid part-time intern and the only one who's working on this project, I'm willing to learn to do things the hard way so they're easier in the long run. Thanks for any suggestions or advice you may be able to offer. -- ~Andrew M. Kelly MLIS Degree Candidate, Simmons GSLIS 2011 Archives & Librarianship Intern, Boston University: African Presidential Archive & Research Center Evening Library Assistant, Bay State College twitter: @a_m_kelly