LISTSERV 16.5 - CODE4LIB Archives

Sorry, I’m told the links got lost in transit:

Notebook: https://data.nls.uk/tools/jupyter-notebooks/exploring-britain-and-uk-handbooks/

Fastwer article: https://towardsdatascience.com/evaluating-ocr-output-quality-with-character-error-rate-cer-and-word-error-rate-wer-853175297510 (and links to Github repo with code)

Sarah
On Sep 3, 2021, 10:20 AM -0500, Sarah Swanz <[log in to unmask]>, wrote:
> This Jupyter notebook from the National Library of Scotland has a section on how to evaluate OCR accuracy under the Data Cleaning chapter.
>
> You might also check out the 'fastwer' package described in this article. I have not used myself so cannot attest to it.
>
> Sarah Swanz
> University of Michigan, School of Information (2018)fast
> On Sep 2, 2021, 3:09 PM -0500, Kimberly Kennedy <[log in to unmask]>, wrote:
> > Hello!
> >
> > I was wondering if anyone has created a script or tool to compare the words
> > in a text file to a dictionary? I'm looking for a way to quantify the
> > quality of OCR output. I've heard that counting the number of words that
> > are in the dictionary is a good quick and dirty way to do this, but I would
> > like to be able to run this script on larger batches of text files so I can
> > compare OCR engines (not count words manually).
> >
> > Let me know if you have any existing tools or thoughts about how to go
> > about this!
> >
> > Thanks,
> >
> > Kim
> >
> >
> >
> > Kimberly Kennedy
> > Digital Production Coordinator
> > Northeastern University Library
> > [log in to unmask]
> > [log in to unmask]