Print

Print


Andrew, 
If you have MS Office, Microsoft has an OCR engine built in. I used it
to OCR some college yearbooks at MPOW. It's not ABBYY but it works
pretty well! It's scriptable using VBScript or your MS language of
choice.

http://msdn.microsoft.com/en-us/library/aa167607(office.11).aspx
Notice the "OCR" method in the document.

I can send you the scripts I have (they're short and simple) if you're
interested in some working code. Let me know.
Mike

Mike Beccaria
Systems Librarian
Head of Digital Initiative
Paul Smith's College
518.327.6376
[log in to unmask]


-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
Andy Kelly
Sent: Wednesday, July 28, 2010 11:47 AM
To: [log in to unmask]
Subject: [CODE4LIB] Free/Open OCR solutions?

I'm working on scanning some documents in a collection and then
preforming
OCR on the documents. Thus far, I've used Adobe Acrobat Pro's OCR
function
with some success but the machines I'm working on are fairly old Pentium
4
Dell boxes, this makes opening 600 DPI scans painful and preforming OCR
an
entirely valid excuse for a long coffee break.

As you might expect, I'm looking for a way to speed up this process at
the
OCR end of things, since the scanning can only move so quickly. I'm
wondering if any of you have experience with any open OCR solutions such
as:
Tesseract-OCR <http://code.google.com/p/tesseract-ocr/> or
ocropus<http://code.google.com/p/ocropus/>.
At a glance, Tesseract seems to be further along in development. Any
other
suggestions on how best to approach this sort of task would be
appreciated
if you've done similar work.

I've got my own Ubuntu Server I'm planning on evaluating one or both of
these on, as much for my own interest as the project's or the
organization's. Since I'm an unpaid part-time intern and the only one
who's
working on this project, I'm willing to learn to do things the hard way
so
they're easier in the long run.

Thanks for any suggestions or advice you may be able to offer.

-- 
~Andrew M. Kelly
MLIS Degree Candidate, Simmons GSLIS 2011
Archives & Librarianship Intern, Boston University: African Presidential
Archive & Research Center
Evening Library Assistant, Bay State College
twitter: @a_m_kelly