You may find what you need from OpenRefine: http://openrefine.org/
On Fri, Nov 21, 2014 at 5:15 PM, Erica FINDLEY <[log in to unmask]> wrote:
> I am working on a project to digitize concert programs. These are the type
> of programs you get when attending a musical concert that list performers
> and details about the concert.
> Since these items are text heavy we have decided to use OCR software to
> output a text file that will enable full text searching in our platform.
> These text files are for the most part accurate, but often have unnecessary
> line breaks and pockets of extra characters and/or incorrect
> capitalization. I would like to pretty them up a little bit if possible.
> I am wondering if there is a script I can use on multiple files to clean
> these type of things up. I don't want to have the digitization staff
> manually edit each text file or have to open each one to run a macro in a
> text editor.
> I have been searching online and so far haven't found anything that will
> work for my situation.
> thanks in advance,
> *Erica Findley*
> Cataloging/Metadata Librarian
> Multnomah County Library
> Phone: 503.988.5466
> [log in to unmask]