But Raffaele, how do you generate the hOCR in the first place if you're
using human-generated transcripts and not OCR? Hand coding each page would
take forever.
On Fri, Jan 17, 2014 at 3:24 AM, raffaele messuti <
[log in to unmask]> wrote:
> Padraic Stack wrote:
> > What is a straightforward way to combine the text with overlaid images
> > to create searchable pdfs?
>
> having transcription in hOCR[1] format the tool you should need is
> hocr2pdf[2].
> i never tried for pdfs, years ago i made some djvu following this
> tutorial[3]
>
> [1] http://en.wikipedia.org/wiki/HOCR
> [2] http://manpages.ubuntu.com/manpages/lucid/man1/hocr2pdf.1.html
> [3] https://philikon.wordpress.com/2009/07/23/digitizing-books-to-djvu/
>
> ciao.
>
> --
> raffaele
>
|