On Oct 16, 2013, at 10:56 AM, Robert Haschart <[log in to unmask]> wrote:
> The abstract extraction routine I have been working on does use
> tesseract internally for doing OCR when it encounters a document that
> doesn't have usable full-text. I agree that tesseract is not that easy
> to install, especially if (as in my case) you do not have root/sudo
> access to the machine. Since I have gone through installing tesseract
> quite recently, perhaps my experience can be helpful to you.
Robert, can you outline the process you used to get Tesseract to do OCR agains PDF documents? I installed Tesseract a few months ago, but I couldn't figure out how to get to work against PDF, only some image files. Any pointers would be greatly appreciated. (Hmmm. Maybe Tesseract doesn't do PDF files, only image files, and I need to convert my PDFs to images, and then the to Tesseract.) --Eric Morgan