LISTSERV 16.5 - CODE4LIB Archives

Hi,

In regards to handwriting, you could always train an OCR library to do 
this and there are several OCR libraries that attempt to do this 
out-of-the-box (probably most notable is Evernote) ...but yeah, the 
results vary greatly depending on the style of writing. Most focus on 
just hand printed things like post-its.

And a quick thing I found out recently about Tesseract. It is pretty 
good if all you want is the text extracted. It does not do layout 
recognition very well, so output will look funky if there's layout 
oddities...like footnotes. But it really depends on what you have and 
what you're trying to do. For example, I did not have much success 
making EPUBS with Tesseract, but it worked great with our theses (which 
have manditory layout requirements). So another big bonus for using the 
Internet Archive (who, I think, use Abbyy).



b,chris.


Eric Lease Morgan wrote:
>
> Thank you for the prompt replies.
>
> Call me cheap or unable to navigate the political/fiscal landscape, 
> but I don't see myself subscribing to a service. Instead I see putting 
> a wrapper around Tesseract, but alas, the wrappers are written in 
> languages that I don't know. [1] Hmmm… On the Perl side, I am having 
> problems installing Image::OCR::Tesseract.
>
> [1] Wrappers - http://code.google.com/p/tesseract-ocr/wiki/AddOns
>
> --
> Eric "Still Cogitating" Morgan