I am working on a project to digitize concert programs. These are the type
of programs you get when attending a musical concert that list performers
and details about the concert.
Since these items are text heavy we have decided to use OCR software to
output a text file that will enable full text searching in our platform.
These text files are for the most part accurate, but often have unnecessary
line breaks and pockets of extra characters and/or incorrect
capitalization. I would like to pretty them up a little bit if possible.
I am wondering if there is a script I can use on multiple files to clean
these type of things up. I don't want to have the digitization staff
manually edit each text file or have to open each one to run a macro in a
I have been searching online and so far haven't found anything that will
work for my situation.
thanks in advance,
Multnomah County Library
[log in to unmask]