Hi,
In regards to handwriting, you could always train an OCR library to do
this and there are several OCR libraries that attempt to do this
out-of-the-box (probably most notable is Evernote) ...but yeah, the
results vary greatly depending on the style of writing. Most focus on
just hand printed things like post-its.
And a quick thing I found out recently about Tesseract. It is pretty
good if all you want is the text extracted. It does not do layout
recognition very well, so output will look funky if there's layout
oddities...like footnotes. But it really depends on what you have and
what you're trying to do. For example, I did not have much success
making EPUBS with Tesseract, but it worked great with our theses (which
have manditory layout requirements). So another big bonus for using the
Internet Archive (who, I think, use Abbyy).
b,chris.
Eric Lease Morgan wrote:
>
> Thank you for the prompt replies.
>
> Call me cheap or unable to navigate the political/fiscal landscape,
> but I don't see myself subscribing to a service. Instead I see putting
> a wrapper around Tesseract, but alas, the wrappers are written in
> languages that I don't know. [1] Hmmm… On the Perl side, I am having
> problems installing Image::OCR::Tesseract.
>
> [1] Wrappers - http://code.google.com/p/tesseract-ocr/wiki/AddOns
>
> --
> Eric "Still Cogitating" Morgan
|