I've been impressed with the video! Such program can be very useful for my everyday tasks. If it is open source, please send whatever available or just keep me updated (include my email in your distribution list). I am not a programmer at all but have some knowledge of regular expressions and Python. I work a lot with MarcEdit. Our ILS system is Voyager. You can email me if you need some testing or some other easy technical help.
Cataloguing and information services
University of Waterloo Library, ON Canada
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Alex Clemmer
Sent: Wednesday, January 21, 2015 4:27 AM
To: [log in to unmask]
Subject: [CODE4LIB] A prototype tool to help clean noisy data
I'm working on a suite of tools to help people clean and normalize data they find in the wild (the plan is eventually to open source it).
I'm hoping if I show you what I have so far, you all can tell me what it's missing that you'd like?
Essentially the premise is that you should be able to teach the computer to parse and munge text data, just by showing it some examples of how to do such transformations. So, for example, you should be able to normalize names and phone numbers just by showing it examples of how you want the data to look in the end.
There's a video demo here:
The most impressive part is towards the middle where it learns to perform complex text transformations. The text is a little small -- sorry, I tried to zoom in but the video editor kept crashing. You might have to download and watch the video on your actual computer.
Theory is the first term in the Taylor series of practice. -- Thomas M Cover (1992)