Hi folks,
I have a number of typescript / manuscript images on which it is quite
time consuming to run OCR. (Or more accurately it is quite time
consuming to correct the OCR).
For some of these I have text files containing accurate transcriptions.
In other cases I have TEI files with these transcriptions.
What is a straightforward way to combine the text with overlaid images
to create searchable pdfs?
I know my way around the command line and can follow tutorials but I'm
not a programmer so the more straightforward the solution the better.
I have had a go with pdftkBuilder and a result can be seen here
[https://www.dropbox.com/s/fxp6rnt24043aez/result3.pdf] but there are a
number of problems:
1. it involves 'printing' the text to pdf and 'stamping' the image over
it. The result entails a margin unless the image matches a standard
paper size.
2. the underlying text doesn't match up to the image. I would love if it
could but can live with it if can't.
3. it is very time consuming - ideally I would like a solution that
could be scripted and left to run.
Any advice would be greatly appreciated.
The best I have
--
Padraic
Padraic Stack | Digital Humanities Support Officer | NUI Maynooth | [log in to unmask] |Phone: Mon: 01 474 7187 Tue - Fri: 01 474 7197
|