I put together some patches for determining the coordinates of bounding boxes on github with Tesseract [1], that's an extra feature of ABBYY which is invaluable for activities like highlighting search terms on the original image. For many materials, I think Tesseract is a serious rival to ABBYY for accuracy, one of the big factors seems to be how much contrast can be introduced into the source image to separate the characters from the background. ABBYY has impressive options for enlisting multiple machines for large quantities of scanned images, but that path is fairly pricey and it is a very windows-centric solution. Tesseract can fit into a Hadoop framework, which would be one approach for large quantities of materials and is more platform independent. ABBYY will probably come close to delivering the best OCR can offer straight out of the box but Tesseract is worth the extra hoops if you have a steady stream of incoming material, especially if the material is going straight from the page to the scanner, and does not represent the "image of an image" encounters found with things like the scans of microfilm reels. art --- 1. https://github.com/artunit/ossocr