If you have MS Office, Microsoft has an OCR engine built in. I used it
to OCR some college yearbooks at MPOW. It's not ABBYY but it works
pretty well! It's scriptable using VBScript or your MS language of
Notice the "OCR" method in the document.
I can send you the scripts I have (they're short and simple) if you're
interested in some working code. Let me know.
Head of Digital Initiative
Paul Smith's College
[log in to unmask]
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
Sent: Wednesday, July 28, 2010 11:47 AM
To: [log in to unmask]
Subject: [CODE4LIB] Free/Open OCR solutions?
I'm working on scanning some documents in a collection and then
OCR on the documents. Thus far, I've used Adobe Acrobat Pro's OCR
with some success but the machines I'm working on are fairly old Pentium
Dell boxes, this makes opening 600 DPI scans painful and preforming OCR
entirely valid excuse for a long coffee break.
As you might expect, I'm looking for a way to speed up this process at
OCR end of things, since the scanning can only move so quickly. I'm
wondering if any of you have experience with any open OCR solutions such
Tesseract-OCR <http://code.google.com/p/tesseract-ocr/> or
At a glance, Tesseract seems to be further along in development. Any
suggestions on how best to approach this sort of task would be
if you've done similar work.
I've got my own Ubuntu Server I'm planning on evaluating one or both of
these on, as much for my own interest as the project's or the
organization's. Since I'm an unpaid part-time intern and the only one
working on this project, I'm willing to learn to do things the hard way
they're easier in the long run.
Thanks for any suggestions or advice you may be able to offer.
~Andrew M. Kelly
MLIS Degree Candidate, Simmons GSLIS 2011
Archives & Librarianship Intern, Boston University: African Presidential
Archive & Research Center
Evening Library Assistant, Bay State College