You might take a look at Tesseract [1]. On a typical Linux box: $ tesseract input.tif outputName hocr renders html with some coordinate information. You might be able to process from that output to ALTO. Cheers, Bridger [1] http://code.google.com/p/tesseract-ocr/ On Thu, Sep 6, 2012 at 8:29 AM, Michael Beccaria <[log in to unmask]>wrote: > I inadvertently purchase ABBYY Finereader 11 Corporate thinking that it > would be capable of outputting to ALTO XML. I was wrong. ABBYY Finereader > Engine does:/ > > Ultimately, I want to OCR some newspaper images and export them to ALTO > XML and, until the proof of concept is done, I want to try to do it on the > cheap. My plan this morning was to write some scripts to OCR them using > Microsoft Office Document Imaging (MODI) and then export the results to > ALTO XML which could be a big project. Has anyone done this before or know > of a quick and dirty way to get some OCR data? > Thanks, > Mike Beccaria > Systems Librarian > Paul Smith's College > 518.327.6376 >