Simon Spero <[log in to unmask]> wrote:
> Another option is to use the ABBYY FineReader
> Annoyingly, the linux version is one release behind the windows SDK (which
> has improved support for multi core processing of single document). Since
> Owen's problem is embarrassingly parallel, multi-core tuning isn't as
> useful as being able to run on a local cluster or regional grid. ABBYY
> software tends to be a little pricey, but the results are usually very good.
If you're going to OCR, Nuance OmniPage is also very good, and I believe
costs about the same as FineReader. We also use tOCR, from Transym,
which is Windows-only, but very accurate and cheap. I have yet to see
decent results on complicated pages (technical papers) from either
OCRopus or Tesseract with the default models that they come with; I
believe they're both still aimed at book page OCR.