-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 3/21/2014 2:34 PM, Andrew Gordon wrote: > Ken, > > A group in Chicago has been working for a few years now on a > deduplication toolkit that might do what you are looking for, they > also have a couple versions that works with an excel file or .csv > file. > > https://github.com/datamade/dedupe > https://github.com/datamade/dedupe-web > https://github.com/datamade/csvdedupe > > I have not worked with them extensively, but I have heard others > find these very useful for entity recognition and resolution. +1 Attended this very interesting talk on just that http://pyvideo.org/video/973/big-data-de-duping ./fxk - -- QOTD: "A child of 5 could understand this! Fetch me a child of 5." -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (MingW32) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJTLIj9AAoJEOptrq/fXk6Mjl4H/jMa3b+ekRYNnnvLBdMXUr/C p+0tAu3SI5GkfbWe1JGLU6cPcM0Ret22RxKg+QslADZ00aGj2RM8sh+4fV0neFXB /sA7wHh/8thtFW1njKpaLQZg5f+px6zB8ch9wdp4yf7L0pPb1612fxGRHMjH5u51 vFUAF3r6wM3JIYjAEPKhzq5511soASisV0IWMEyAoRYNyjKbOyan/gN97G/oYxXp MvwxFAwiOPgwL83Set0kMqztCA2aW76uFwwgvWkhGIcywBR7w7Adl1/MTM9oLBtd lyeimBXWKvqvArai9txMcC4mOLkZq03FAWypVhe+VOBm4xmmDhowr3YeaaJWl3k= =Kv3q -----END PGP SIGNATURE-----