Hi code4lib folks .. and again ... happy Friday!! I just wanted to post an update to this. I wrote in to the Security Now! podcast (fantastic show by the way and fully worth listening to on a regular basis) about this notion, and it was made the main topic of show number 940! https://twit.tv/shows/security-now/episodes/940?autostart=false The discussion starts around the 1:36 mark. Here's what I wrote to Steve Gibson: In addition to being an avid listener to Security Now, I'm also a System > Administrator for a large public library system in Ohio. Libraries often > struggle with data—being especially sensitive around data related to > patrons and patron behavior in terms of borrowing, library program > attendance, reference questions, etc. The common practice is for libraries > to aggregate and then promptly destroy this data within a short time > frame—which is typically one month. However, administrators and local > government officials, who are often instrumental in allocating library > funding and guiding operational strategies, frequently ask questions on a > larger time scale than one month to validate the library's significance and > its operational strategies. Disaggregation of this data to answer these > types of questions is very difficult and arguably impossible. This puts > people like me, and many others like me, in a tough spot in terms of > storing and later using sensitive data to provide the answers to these > questions of pretty serious consequence—like, what should we spend money > on, or why we should continue to exist. I’m sure you’re aware, but there are many interesting historical reasons > for this sensitivity, and organizations like the American Library > Association (ALA) and other international library associations have even > codified the protection of patron privacy into their codes of ethics. For > example, the ALA's Code of Ethics states: "We protect each library user's > right to privacy and confidentiality with respect to information sought or > received and resources consulted, borrowed, acquired or transmitted." While > I deeply respect and admire this stance, it doesn't provide a solution for > those of us wrestling with the aforementioned existential questions. > In this context, I'd be immensely grateful if you could share your insights > on the technique of "Pseudonymization" ( https:// > en.wikipedia.org/wiki/Pseudonymization <https://t.co/gVKvpmzoxp>) for PII > data. Additionally, I'd appreciate a brief review of a Python module I'm > developing, which aims to assist me (and potentially other library > professionals) in retaining crucial data for subsequent analysis while > ensuring data subject privacy. https://gist.github.com/rayvoelker/80c > 0dfa5cb47e63c7e498bd064d3c0b6 <https://t.co/aAapRKgElr> Thank you once > again, Steve, for your invaluable contributions to the security community. > I eagerly await your feedback! > I think the even better solution compared to Pseudonymization involves the Birthday Paradox. It's a direction I hadn't even thought of for this! --Ray On Fri, Sep 15, 2023 at 2:43 PM Ray Voelker <[log in to unmask]> wrote: > Hi code4lib folks .. happy Friday! > > I started putting together a little Python utility for doing > Pseudonymization tasks (https://en.wikipedia.org/wiki/Pseudonymization). > The goal is to be able to do more analysis on data related to circulation > while securely maintaining patron privacy. > > For a little bit of background I wanted something *like a hash* (but more > secure than a hash), for replacing select fields related to patron records. > I also wanted something that could possibly be reversed given an encrypted > private key that would be stored well outside of the scope of the project. > I'm thinking that if you wanted to geocode addresses for example, you could > temporarily decrypt each field needed for the task, use the > *pseudonymized* patron id as the identifier, and then send your data off > to the geocoder of your choice. Another example would be to store a > pseudonymized patron id as the identifier in things like circulation data > used for later analysis, or for transmitting to trusted 3rd parties who may > do analysis for you. > > I'm humbly asking for anyone with some background in using encryption to > review the code I have and maybe offer some comments / concerns / > suggestions / jokes about this. > > Thanks in advance! > > https://gist.github.com/rayvoelker/80c0dfa5cb47e63c7e498bd064d3c0b6 > > -- > Ray Voelker > -- Ray Voelker (937) 620-1830