Print

Print


Erica:

I've used Paperwork (https://openpaper.work/en/) in the past with good 
results. It's open source and runs on Linux and Windows. If you'd be 
interested in running a web application you might give some of these 
options a look (https://github.com/kba/awesome-ocr#ocr-gui) or maybe 
even look into a document management web application 
(https://github.com/awesome-selfhosted/awesome-selfhosted#document-management) 
though the later might be overkill for your use case.

Finally, if you're running a Mac somewhere and have money to spend, I 
cannot overstate how much I love DevonThink 
(https://www.devontechnologies.com/apps/devonthink) which has a server 
version and uses ABBYY on the backend. My quick test this morning 
suggests it doesn't have the issue you're describing.

best,

ak

--
ander kierig
Web Application Developer
University of Minnesota Libraries
[lib.umn.edu](https://www.lib.umn.edu)
they/them

On 2022-08-05 at 18:12 (-0500) Erica FINDLEY wrote:

> All,
>
> ABBYY has been a favorite program of mine for transforming batches of 
> TIFF
> files into a PDF and extracting the text.
>
> However, I have recently run into this known issue
> <https://support.abbyy.com/hc/en-us/articles/360013874239-Each-page-is-duplicated-with-the-thumbnail-image-while-converting-TIFF-to-PDF-in-FineReader>even
> though each TIFF file is the same resolution.
>
> I opened a support ticket with ABBYY and their proposed resolution is 
> for
> me to convert to another format (jpg) then to pdf. I do not like this 
> for
> two reasons 1)it is time and resource consuming to do two 
> transformations
> and 2) there is some image quality loss when doing this.
>
>
> This leaves me with two questions:
>
> 1. Has anyone been able to find a better workaround for this issue?
>
> 2. Does anyone have recommendations for another GUI based OCR program? 
> My
> quick research is pointing to Tesseract, but since I work with 
> volunteers
> I'd prefer a GUI based solution.
>
> Thanks!
>
> *Erica Findley (she/her)*
> *Systems & Metadata Librarian*
> *x80591*
> Multnomah County Library
> Isom Operations Center: Thu 8 am - 5 pm, Fri 1:30 pm - 5:30 pm
> Teleworking: Mon - Wed 8 am - 5 pm, Fri 8 am - 12 pm
> multcolib.org <http://www.multcolib.org/>
> My pronouns are she/her/hers