It depends on languages. Few years ago I tested many packages for old roman languages mainly English, French, Dutch and German. In terms of accuracy ABBYY was the best. Karim Boughida [log in to unmask] [log in to unmask] On Sat, Nov 5, 2011 at 5:08 PM, Art W Rhyno <[log in to unmask]> wrote: > I put together some patches for determining the coordinates of bounding > boxes on github with Tesseract [1], that's an extra feature of ABBYY which > is invaluable for activities like highlighting search terms on the > original image. For many materials, I think Tesseract is a serious rival > to ABBYY for accuracy, one of the big factors seems to be how much > contrast can be introduced into the source image to separate the > characters from the background. ABBYY has impressive options for enlisting > multiple machines for large quantities of scanned images, but that path is > fairly pricey and it is a very windows-centric solution. Tesseract can fit > into a Hadoop framework, which would be one approach for large quantities > of materials and is more platform independent. ABBYY will probably come > close to delivering the best OCR can offer straight out of the box but > Tesseract is worth the extra hoops if you have a steady stream of incoming > material, especially if the material is going straight from the page to > the scanner, and does not represent the "image of an image" encounters > found with things like the scans of microfilm reels. > > art > --- > 1. https://github.com/artunit/ossocr > --