This Jupyter notebook from the National Library of Scotland has a section on how to evaluate OCR accuracy under the Data Cleaning chapter. You might also check out the 'fastwer' package described in this article. I have not used myself so cannot attest to it. Sarah Swanz University of Michigan, School of Information (2018)fast On Sep 2, 2021, 3:09 PM -0500, Kimberly Kennedy <[log in to unmask]>, wrote: > Hello! > > I was wondering if anyone has created a script or tool to compare the words > in a text file to a dictionary? I'm looking for a way to quantify the > quality of OCR output. I've heard that counting the number of words that > are in the dictionary is a good quick and dirty way to do this, but I would > like to be able to run this script on larger batches of text files so I can > compare OCR engines (not count words manually). > > Let me know if you have any existing tools or thoughts about how to go > about this! > > Thanks, > > Kim > > > > Kimberly Kennedy > Digital Production Coordinator > Northeastern University Library > [log in to unmask] > [log in to unmask]