Hi folks,
I have a number of typescript / manuscript images on which it is quite 
time consuming to run OCR. (Or more accurately it is quite time 
consuming to correct the OCR).
For some of these I have text files containing accurate transcriptions. 
In other cases I have TEI files with these transcriptions.
What is a straightforward way to combine the text with overlaid images 
to create searchable pdfs?
I know my way around the command line and can follow tutorials but I'm 
not a programmer so the more straightforward the solution the better.
I have had a go with pdftkBuilder and a result can be seen here 
[https://www.dropbox.com/s/fxp6rnt24043aez/result3.pdf] but there are a 
number of problems:
1. it involves 'printing' the text to pdf and 'stamping'  the image over 
it. The result entails a margin unless the image matches a standard 
paper size.
2. the underlying text doesn't match up to the image. I would love if it 
could but can live with it if can't.
3. it is very time consuming - ideally I would like a solution that 
could be scripted and left to run.
Any advice would be greatly appreciated.
The best I have
-- 
Padraic
Padraic Stack | Digital Humanities Support Officer | NUI Maynooth | [log in to unmask] |Phone: Mon: 01 474 7187 Tue - Fri: 01 474 7197 
 |