Let me echo Jim in suggesting a transcription tool rather than OCR for
handwritten texts. However, a lot depends on the kinds of material you're
working with and the uses you plan for the transcripts. Is it structured
data, like census records, account books, or an index cards database?
Is it free-form text like diaries or letters? Does the text contain a lot of
genetic elements like strike-throughs, careted insertions and
marginalia? Do you want to index terms so that readers can view all
mentions of banjos within the text?
At present, there is no one tool that supports all of these. I built and
maintain one (AGPL) tool for free-form text to be used in indexing
[Self-promotion: http://fromthepage.com/ is the tool; source is at
http://github.com/benwbrum/fromthepage/ ] and have spent the last
year building another (Apache) tool for converting tabular records into
a search database. I think they're great, and am really excited about
them both. Nevertheless, last week I pointed a project at Jim's
T-PEN instead of my own tools, because the manuscripts were medieval
Arabic donation records which needed line-based transcription.
I maintain a list of transcription tools used in crowdsourcing
projects here: http://tinyurl.com/TranscriptionToolGDoc
Currently there are around 30 that I know of, and I'd be happy
to give my opinion of what's appropriate for your project on or off
list.
Ben Brumfield
http://manuscripttranscription.blogspot.com/
|