Print

Print


Tesseract has really poor quality last time I tried it and ABBYY server is ridiculously expensive (and charges perpage). Leadtools has an ocr sdk but it too is expensive. If you want to go relatively cheap on this (and I don't know for sure but probably break some licensing agreement with ABBYY) you could set up a web server with a $99 version of abbyy finereader with a hotfolder set up to convert anything that is dropped into it to txt. You would then have to write the backend to keep track of the files that were submitted, let abbyy convert it, and then show the results to the end user.

Mike Beccaria
Systems Librarian
Head of Digital Initiative
Paul Smith's College
518.327.6376
[log in to unmask]
Become a friend of Paul Smith's Library on Facebook today!

-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Eric Lease Morgan
Sent: Tuesday, March 12, 2013 2:16 PM
To: [log in to unmask]
Subject: Re: [CODE4LIB] web-based ocr

Thank you for the prompt replies. 

Call me cheap or unable to navigate the political/fiscal landscape, but I don't see myself subscribing to a service. Instead I see putting a wrapper around Tesseract, but alas, the wrappers are written in languages that I don't know. [1] Hmmm... On the Perl side, I am having problems installing Image::OCR::Tesseract. 

[1] Wrappers - http://code.google.com/p/tesseract-ocr/wiki/AddOns

--
Eric "Still Cogitating" Morgan