LISTSERV 16.5 - CODE4LIB Archives

Hi,

I am wanting to add epub output to our scanning workflow...just like the
Internet Archive does. However, looking at their code, it appears they are
using Abbyy FineReader for OCR.

We're using Tesseract to make hOCR files, which we combine to with the
images to make PDFs. Has anyone done the conversion of hOCR files to ePub?

I want to avoid the PDF or DjVU to ePUB conversion, since the output from
this is usually very bad.

Thanks.. b,chris.