Print

Print


On Fri, 17 Oct 2008, James Tuttle wrote:

> I wonder if any of you might have experience with creating text PDFs
> from  TIFFs.  I've been using tiffcp to stitch TIFFs together into a
> single image and then using tiff2pdf to generate PDFs from the single
> TIFF.  I've had to pass this image-based PDF to someone with Acrobat to
> use it's batch processing facility to OCR the text and save a text-based
> PDF.  I wonder if anyone has suggestions for software I can integrate
> into the script (Python on Linux) I'm using.

I don't, but I've used the batch processing of Acrobat before to do the 
OCR -- and let me suggest that you make sure to back up the files before 
running the batch.

I selected the wrong option, and instead of ending up with image+text, it 
stripped out the image, and saved overtop of the original files.  (wiping 
out a week's worth of scanning for me)

I've also never found a good way of editing the 'tags' that Acrobat 
generates -- so it marks up each line of the document as a new paragraph 
and I couldn't find any good tools to merge the tags (although, I was 
running an older version of Acrobat ... 6, I think)

-Joe