Howdy all, I've just started a project that involves harvesting large numbers of scanned PDF's and extracting information from the text from the OCR output. The process I've started with -- use imagemagick to convert to tiff and tesseract to pull out the OCR -- is more system intensive than I hoped it would be. Is there an easier/faster process that I'm missing? Perl friendly solutions are preferred because this fits in as part of a larger process. If I am already using my best option, what kind of image parameters are recommended if I want to hit the point of diminishing returns but not necessarily go for the best possible? Thanks, kyle