Something like this is on my "to do" list for our future Fedora Commons deployment here at UConn. I was considering wrapping a SOAP interface around something like the Perl Image::OCR::Tesseract module and adding it to our ingest pipeline unless someone can recommend a better OCR application.


-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Till Kinstler
Sent: Tuesday, March 12, 2013 12:30 PM
To: [log in to unmask]
Subject: Re: [CODE4LIB] web-based ocr

Am 12.03.2013 16:57, schrieb Eric Lease Morgan:

> Does anybody know of something like this that exists already?

We are running something like this. Not with a HTML or REST-ful front end, but WebDAV. The users of this service do "mass digitization". They mount their individual WebDAV share, push scanned image files there and read the OCR results from output files (usually not "by hand" but with some software that manages their digitization workflow).
The actual OCR is done by an ABBYY Recognition Server, the "WebDAV front end" including accounting is a straightforward home-brewed solution.


Till Kinstler
Verbundzentrale des Gemeinsamen Bibliotheksverbundes (VZG) Platz der Göttinger Sieben 1, D 37073 Göttingen [log in to unmask], +49 (0) 551 39-13431,