We have used in the past OCR shop XTR from Vividata under linux
(command-line utility, not API). It can be nicely scripted under linux
and has given us decent results, despite some quirks, which may have now
been taken care of since our product was purchased in 2003.
It looks like they now offer a version with image-over-text PDF output:
http://vividata.com/ocr_comparison.html
-- Alberto
> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
> James Tuttle
> Sent: Friday, October 17, 2008 7:57 AM
> To: [log in to unmask]
> Subject: [CODE4LIB] OCR PDFs
>
> I wonder if any of you might have experience with creating text PDFs
> from TIFFs. I've been using tiffcp to stitch TIFFs together into a
> single image and then using tiff2pdf to generate PDFs from the single
> TIFF. I've had to pass this image-based PDF to someone with Acrobat to
> use it's batch processing facility to OCR the text and save a text-based
> PDF. I wonder if anyone has suggestions for software I can integrate
> into the script (Python on Linux) I'm using.
>
> Thanks,
> James
>
--
Dr. Alberto Accomazzi aaccomazzi(at)cfa harvard edu
Project Manager
NASA Astrophysics Data System ads.harvard.edu
Harvard-Smithsonian Center for Astrophysics www.cfa.harvard.edu
60 Garden St, MS 67, Cambridge, MA 02138, USA
|