It's not exactly what you're looking for, but Microsoft Office comes with a scripting OCR engine that works on TIFFs. I use it to get text from yearbooks we are scanning so people can look for names and such. While I wouldn't put it on par with ABBYY, it does a pretty decent job. I wrote a simple script in vbscript that scans all the tiff files in a folder and exports a txt file with the same name as the image that has all of the text it finds. If you want it, let me know and I'll send it your way. Mike Beccaria Systems Librarian Head of Digital Initiatives Paul Smith's College 518.327.6376 [log in to unmask] --- This message may contain confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. -----Original Message----- From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of James Tuttle Sent: Friday, October 17, 2008 7:57 AM To: [log in to unmask] Subject: [CODE4LIB] OCR PDFs -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I wonder if any of you might have experience with creating text PDFs from TIFFs. I've been using tiffcp to stitch TIFFs together into a single image and then using tiff2pdf to generate PDFs from the single TIFF. I've had to pass this image-based PDF to someone with Acrobat to use it's batch processing facility to OCR the text and save a text-based PDF. I wonder if anyone has suggestions for software I can integrate into the script (Python on Linux) I'm using. Thanks, James - -- - ------------------------------- James Tuttle Digital Repository Librarian NCSU Libraries, Box 7111 North Carolina State University Raleigh, NC 27695-7111 [log in to unmask] (919)513-0651 Phone (919)515-3031 Fax -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFI+H1zKxpLzx+LOWMRAgxIAJwNXyeMJbk6r6hmHpNAdEvWIQbCVgCgp8JR nyS3WZ4UuRbU/6DTH7ohe/M= =mT2T -----END PGP SIGNATURE-----