Hi code4lib folks .. happy Friday!
I started putting together a little Python utility for doing
Pseudonymization tasks (https://en.wikipedia.org/wiki/Pseudonymization).
The goal is to be able to do more analysis on data related to circulation
while securely maintaining patron privacy.
For a little bit of background I wanted something *like a hash* (but more
secure than a hash), for replacing select fields related to patron records.
I also wanted something that could possibly be reversed given an encrypted
private key that would be stored well outside of the scope of the project.
I'm thinking that if you wanted to geocode addresses for example, you could
temporarily decrypt each field needed for the task, use the *pseudonymized*
patron id as the identifier, and then send your data off to the geocoder of
your choice. Another example would be to store a pseudonymized patron id as
the identifier in things like circulation data used for later analysis, or
for transmitting to trusted 3rd parties who may do analysis for you.
I'm humbly asking for anyone with some background in using encryption to
review the code I have and maybe offer some comments / concerns /
suggestions / jokes about this.
Thanks in advance!
https://gist.github.com/rayvoelker/80c0dfa5cb47e63c7e498bd064d3c0b6
--
Ray Voelker
|