Hi Eric, On Thu, Oct 17, 2013 at 09:43:04AM -0400, Eric Lease Morgan wrote: > Robert, can you outline the process you used to get Tesseract to do > OCR agains PDF documents? I installed Tesseract a few months ago, > but I couldn't figure out how to get to work against PDF, only some > image files. Any pointers would be greatly appreciated. (Hmmm. Maybe > Tesseract doesn't do PDF files, only image files, and I need to > convert my PDFs to images, and then the to Tesseract.) --Eric Morgan Once you have Tesseract installed, the easiest way to use it for adding an OCR text layer to PDF files is this Ruby script IMHO: https://github.com/gkovacs/pdfocr Geza Kovacs wrote it for Cuneiform and an old version of OCRopus. I added Tesseract support later. If you cannot use Ruby for some reason, I could upload a BASH script doing the same thing. Cheers, Christian -- Christian Pietsch · http://purl.org/net/pietsch LibTec · Library Technology and Knowledge Management Bielefeld University Library, Bielefeld, Germany