On Fri, 17 Oct 2008, James Tuttle wrote:
> I wonder if any of you might have experience with creating text PDFs
> from TIFFs. I've been using tiffcp to stitch TIFFs together into a
> single image and then using tiff2pdf to generate PDFs from the single
> TIFF. I've had to pass this image-based PDF to someone with Acrobat to
> use it's batch processing facility to OCR the text and save a text-based
> PDF. I wonder if anyone has suggestions for software I can integrate
> into the script (Python on Linux) I'm using.
I don't, but I've used the batch processing of Acrobat before to do the
OCR -- and let me suggest that you make sure to back up the files before
running the batch.
I selected the wrong option, and instead of ending up with image+text, it
stripped out the image, and saved overtop of the original files. (wiping
out a week's worth of scanning for me)
I've also never found a good way of editing the 'tags' that Acrobat
generates -- so it marks up each line of the document as a new paragraph
and I couldn't find any good tools to merge the tags (although, I was
running an older version of Acrobat ... 6, I think)
-Joe
|