-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 3/21/2014 2:34 PM, Andrew Gordon wrote:
> Ken,
>
> A group in Chicago has been working for a few years now on a
> deduplication toolkit that might do what you are looking for, they
> also have a couple versions that works with an excel file or .csv
> file.
>
> https://github.com/datamade/dedupe
> https://github.com/datamade/dedupe-web
> https://github.com/datamade/csvdedupe
>
> I have not worked with them extensively, but I have heard others
> find these very useful for entity recognition and resolution.
+1
Attended this very interesting talk on just that
http://pyvideo.org/video/973/big-data-de-duping
./fxk
- --
QOTD:
"A child of 5 could understand this! Fetch me a child of 5."
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (MingW32)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iQEcBAEBAgAGBQJTLIj9AAoJEOptrq/fXk6Mjl4H/jMa3b+ekRYNnnvLBdMXUr/C
p+0tAu3SI5GkfbWe1JGLU6cPcM0Ret22RxKg+QslADZ00aGj2RM8sh+4fV0neFXB
/sA7wHh/8thtFW1njKpaLQZg5f+px6zB8ch9wdp4yf7L0pPb1612fxGRHMjH5u51
vFUAF3r6wM3JIYjAEPKhzq5511soASisV0IWMEyAoRYNyjKbOyan/gN97G/oYxXp
MvwxFAwiOPgwL83Set0kMqztCA2aW76uFwwgvWkhGIcywBR7w7Adl1/MTM9oLBtd
lyeimBXWKvqvArai9txMcC4mOLkZq03FAWypVhe+VOBm4xmmDhowr3YeaaJWl3k=
=Kv3q
-----END PGP SIGNATURE-----
|