LISTSERV 16.5 - CODE4LIB Archives

This Jupyter notebook from the National Library of Scotland has a section on how to evaluate OCR accuracy under the Data Cleaning chapter.

You might also check out the 'fastwer' package described in this article. I have not used myself so cannot attest to it.

Sarah Swanz
University of Michigan, School of Information (2018)fast
On Sep 2, 2021, 3:09 PM -0500, Kimberly Kennedy <[log in to unmask]>, wrote:
> Hello!
>
> I was wondering if anyone has created a script or tool to compare the words
> in a text file to a dictionary? I'm looking for a way to quantify the
> quality of OCR output. I've heard that counting the number of words that
> are in the dictionary is a good quick and dirty way to do this, but I would
> like to be able to run this script on larger batches of text files so I can
> compare OCR engines (not count words manually).
>
> Let me know if you have any existing tools or thoughts about how to go
> about this!
>
> Thanks,
>
> Kim
>
>
>
> Kimberly Kennedy
> Digital Production Coordinator
> Northeastern University Library
> [log in to unmask]
> [log in to unmask]