Print

Print


Kim-

Yes, I have a script that Jon Stroop wrote to do something very similar: https://github.com/pulibrary/pulfa-sausage-factory/blob/master/bin/orient_image.sh <https://github.com/pulibrary/pulfa-sausage-factory/blob/master/bin/orient_image.sh>

The use case was a little different (orient the page each possible way and check which way was the correct orientation based on how many words were in the dictionary), but I think it should be easy to adapt to your  needs.

-Esmé

> On Sep 2, 2021, at 4:07 PM, Kimberly Kennedy <[log in to unmask]> wrote:
> 
> Hello!
> 
> I was wondering if anyone has created a script or tool to compare the words
> in a text file to a dictionary? I'm looking for a way to quantify the
> quality of OCR output. I've heard that counting the number of words that
> are in the dictionary is a good quick and dirty way to do this, but I would
> like to be able to run this script on larger batches of text files so I can
> compare OCR engines (not count words manually).
> 
> Let me know if you have any existing tools or thoughts about how to go
> about this!
> 
> Thanks,
> 
> Kim
> 
> 
> 
> Kimberly Kennedy
> Digital Production Coordinator
> Northeastern University Library
> [log in to unmask]
> [log in to unmask]