One solution would be to use the pdfimages utility from Poppler to
extract all the images from the PDF into a directory. You would then
place the corresponding hocr files in the same directory and then
run the hocr-pdf utility from hocr-tools.
Both software packages are readily available on many Linux systems.
NYU Digital Library
On Wed, May 6, 2020 at 2:42 PM Kimberly Kennedy <[log in to unmask]>
> I have an unusual situation. I've created a PDF that I want to be text
> searchable. However, I would like to use OCR data from a different source
> than that document. Is it possible to add a text file as the OCR layer to
> an existing PDF?
> Any ideas would be appreciated!
> Kimberly Kennedy
> Digital Production Coordinator
> Northeastern University Library
> [log in to unmask]