At the risk of shameless self-promotion, I would suggest an alternative to
the attempt at using OCR for handwriting. My field of research focuses on
pre-modern manuscripts which, to no one's surprise, have resisted any OCR
method. One solution is to create an environment that makes transcribing
an effective and efficient task. To that end, here at Saint Louis
University, we built a web-based app called T-PEN. T-PEN attempts to
identify the location of each line on a digital surrogate and then displays
it with a text box underneath to ensure accurate transcription.
The URL is t-pen.org. It's free for anyone. In addition to the
repositories that have given us access, users can upload private images to
work with.
I know that this solution is not ideal for large sets of handwritten texts,
but T-PEN does support crowd-sourcing (what we call public projects). You
can also encode as you transcribe and then export the transcription as an
XML document (and you can even export transcriptions in OAC currently as
RDF/XML).
There is introductory video at
http://www.youtube.com/watch?feature=player_embedded&v=_81fJbOpTcE.
Jim
On Tue, Mar 12, 2013 at 2:00 PM, Kyle Banerjee <[log in to unmask]>wrote:
> If it's for a discrete project, I'd say scan what you need OCR'd and put it
> on Mechanical Turk
>
> kyle
>
>
> On Tue, Mar 12, 2013 at 10:56 AM, Donna Campbell <[log in to unmask]>
> wrote:
>
> > On a related note, I am looking for a recommendation for software that
> > provides OCR for handwriting (print and/or cursive). To clarify, this
> > would be pen ink on paper not digital ink.
> >
> > Thank you,
> > Donna R. Campbell
> > Technical Services & Systems Librarian
> > (215) 935-3872 (phone)
> > (267) 295-3641 (fax)
> > Mailing Address (via USPS):
> > Westminster Theological Seminary Library
> > P.O. Box 27009
> > Philadelphia, PA 19118 USA
> > Shipping Address (via UPS or FedEx):
> > Westminster Theological Seminary Library
> > 2960 W. Church Rd.
> > Glenside, PA 19038 USA
> >
> > -----Original Message-----
> > From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
> > Eric Lease Morgan
> > Sent: Tuesday, March 12, 2013 11:57 AM
> > To: [log in to unmask]
> > Subject: [CODE4LIB] web-based ocr
> >
> > Does anybody here know of a Web-based OCR program or Web service?
> >
> > Many people want to do OCR against digitized texts. We all know of
> various
> > OCR applications (Adobe Acrobat, ABBYY FineReader, Google's Tesseract,
> > etc.), but they are not necessarily Web-based. As a service to my
> > university, I thought it might be cool (or "kewl") to support an image to
> > text application. Go to Web form. Submit one or more image files. Have
> OCR
> > done against them no matter how dirty the output. Return plain text. As a
> > bonus, the application would support a REST-ful API.
> >
> > Does anybody know of something like this that exists already?
> >
> > --
> > Eric Lease Morgan
> > University of Notre Dame
> >
>
--
----------
James R. Ginther, PhD
Professor of Medieval Theology,
Associate Chair, Department of Theology
& Director, Center for Digital Theology
Saint Louis University
-------------------------
[log in to unmask]
Faculty Page: Departmental
Page<https://sites.google.com/a/slu.edu/james-ginther/>
<https://sites.google.com/a/slu.edu/james-ginther/>Research Blog:
http://digital-editor.blogspot.com
Twitter: DH_editor <http://twitter.com/#!/DH_editor>
T-PEN: www.tpen.org/
NOTE: This e-mail message may contain information that may be privileged,
confidential, and exempt from disclosure. It is intended for use only by
the person(s) to whom it is addressed. If you have received this message in
error, please do not forward or use this information in any way; delete it
immediately, and contact the sender as soon as possible by the reply option
or by telephone at 314-977-4248.
|