Sorry, I’m told the links got lost in transit: Notebook: https://data.nls.uk/tools/jupyter-notebooks/exploring-britain-and-uk-handbooks/ Fastwer article: https://towardsdatascience.com/evaluating-ocr-output-quality-with-character-error-rate-cer-and-word-error-rate-wer-853175297510 (and links to Github repo with code) Sarah On Sep 3, 2021, 10:20 AM -0500, Sarah Swanz <[log in to unmask]>, wrote: > This Jupyter notebook from the National Library of Scotland has a section on how to evaluate OCR accuracy under the Data Cleaning chapter. > > You might also check out the 'fastwer' package described in this article. I have not used myself so cannot attest to it. > > Sarah Swanz > University of Michigan, School of Information (2018)fast > On Sep 2, 2021, 3:09 PM -0500, Kimberly Kennedy <[log in to unmask]>, wrote: > > Hello! > > > > I was wondering if anyone has created a script or tool to compare the words > > in a text file to a dictionary? I'm looking for a way to quantify the > > quality of OCR output. I've heard that counting the number of words that > > are in the dictionary is a good quick and dirty way to do this, but I would > > like to be able to run this script on larger batches of text files so I can > > compare OCR engines (not count words manually). > > > > Let me know if you have any existing tools or thoughts about how to go > > about this! > > > > Thanks, > > > > Kim > > > > > > > > Kimberly Kennedy > > Digital Production Coordinator > > Northeastern University Library > > [log in to unmask] > > [log in to unmask]