We also used MS Office ORC engine to OCR student newspapers between 1914 and 2006. We ran the OCR overnight using a batch program. It took a few weeks, but worked well. Sarah -----Original Message----- From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Bill Janssen Sent: Monday, August 02, 2010 2:09 PM To: [log in to unmask] Subject: Re: [CODE4LIB] Free/Open OCR solutions? Michael Beccaria <[log in to unmask]> wrote: > Andrew, > If you have MS Office, Microsoft has an OCR engine built in. I used it > to OCR some college yearbooks at MPOW. It's not ABBYY but it works > pretty well! It's scriptable using VBScript or your MS language of > choice. > > http://msdn.microsoft.com/en-us/library/aa167607(office.11).aspx > Notice the "OCR" method in the document. > > I can send you the scripts I have (they're short and simple) if you're > interested in some working code. Let me know. > Mike Yes, I second that, it works pretty well. UpLib uses that by default when you install onWindows. In fact, if you install one of the older UpLib releases on a machine running Windows and having a recent Office, it will create a Windows service that's a network OCR server callable via HTTP from other machines. Even if you don't use UpLib for anything else, you can still use the installed OCR service, though the output format is somewhat UpLib-specific. Perhaps an "uplib-ocr-document" command-line tool with hOCR output would be a good add to UpLib. It's a one-liner in a UNIX environment: cat `uplib-add-document --verbosity=0 --ocr --noupload foo.pdf | awk '{ print $2; }'`/contents.txt | tail -n +3 My longer-term plan with UpLib is to move to OCRopus when it's out of alpha. Bill