I'm working on a suite of tools to help people clean and normalize
data they find in the wild (the plan is eventually to open source it).
I'm hoping if I show you what I have so far, you all can tell me what
it's missing that you'd like?
Essentially the premise is that you should be able to teach the
computer to parse and munge text data, just by showing it some
examples of how to do such transformations. So, for example, you
should be able to normalize names and phone numbers just by showing it
examples of how you want the data to look in the end.
There's a video demo here:
The most impressive part is towards the middle where it learns to
perform complex text transformations. The text is a little small --
sorry, I tried to zoom in but the video editor kept crashing. You
might have to download and watch the video on your actual computer.
Theory is the first term in the Taylor series of practice. -- Thomas M