LISTSERV 16.5 - CODE4LIB Archives

Hi Kim,

One solution would be to use the pdfimages utility from Poppler to
extract all the images from the PDF into a directory.  You would then
place the corresponding hocr files in the same directory and then
run the hocr-pdf utility from hocr-tools.

Both software packages are readily available on many Linux systems.

https://poppler.freedesktop.org/
https://github.com/tmbdev/hocr-tools

Thanks,
Rasan
NYU Digital Library


On Wed, May 6, 2020 at 2:42 PM Kimberly Kennedy <[log in to unmask]>
wrote:

> I have an unusual situation. I've created a PDF that I want to be text
> searchable. However, I would like to use OCR data from a different source
> than that document. Is it possible to add a text file as the OCR layer to
> an existing PDF?
>
> Any ideas would be appreciated!
>
> Thanks,
>
> Kim
>
>
> Kimberly Kennedy
> Digital Production Coordinator
> Northeastern University Library
> [log in to unmask]
>