LISTSERV 16.5 - CODE4LIB Archives

Hi folks,

I have a number of typescript / manuscript images on which it is quite 
time consuming to run OCR. (Or more accurately it is quite time 
consuming to correct the OCR).

For some of these I have text files containing accurate transcriptions. 
In other cases I have TEI files with these transcriptions.

What is a straightforward way to combine the text with overlaid images 
to create searchable pdfs?

I know my way around the command line and can follow tutorials but I'm 
not a programmer so the more straightforward the solution the better.

I have had a go with pdftkBuilder and a result can be seen here 
[https://www.dropbox.com/s/fxp6rnt24043aez/result3.pdf] but there are a 
number of problems:

1. it involves 'printing' the text to pdf and 'stamping'  the image over 
it. The result entails a margin unless the image matches a standard 
paper size.
2. the underlying text doesn't match up to the image. I would love if it 
could but can live with it if can't.
3. it is very time consuming - ideally I would like a solution that 
could be scripted and left to run.

Any advice would be greatly appreciated.


The best I have

-- 

Padraic


Padraic Stack | Digital Humanities Support Officer | NUI Maynooth | [log in to unmask] |Phone: Mon: 01 474 7187 Tue - Fri: 01 474 7197