But Raffaele, how do you generate the hOCR in the first place if you're using human-generated transcripts and not OCR? Hand coding each page would take forever. On Fri, Jan 17, 2014 at 3:24 AM, raffaele messuti < [log in to unmask]> wrote: > Padraic Stack wrote: > > What is a straightforward way to combine the text with overlaid images > > to create searchable pdfs? > > having transcription in hOCR[1] format the tool you should need is > hocr2pdf[2]. > i never tried for pdfs, years ago i made some djvu following this > tutorial[3] > > [1] http://en.wikipedia.org/wiki/HOCR > [2] http://manpages.ubuntu.com/manpages/lucid/man1/hocr2pdf.1.html > [3] https://philikon.wordpress.com/2009/07/23/digitizing-books-to-djvu/ > > ciao. > > -- > raffaele >