Hi code4lib folks .. and again ... happy Friday!!
I just wanted to post an update to this. I wrote in to the Security Now!
podcast (fantastic show by the way and fully worth listening to on a
regular basis) about this notion, and it was made the main topic of show
number 940!
https://twit.tv/shows/security-now/episodes/940?autostart=false
The discussion starts around the 1:36 mark.
Here's what I wrote to Steve Gibson:
In addition to being an avid listener to Security Now, I'm also a System
> Administrator for a large public library system in Ohio. Libraries often
> struggle with data—being especially sensitive around data related to
> patrons and patron behavior in terms of borrowing, library program
> attendance, reference questions, etc. The common practice is for libraries
> to aggregate and then promptly destroy this data within a short time
> frame—which is typically one month. However, administrators and local
> government officials, who are often instrumental in allocating library
> funding and guiding operational strategies, frequently ask questions on a
> larger time scale than one month to validate the library's significance and
> its operational strategies. Disaggregation of this data to answer these
> types of questions is very difficult and arguably impossible. This puts
> people like me, and many others like me, in a tough spot in terms of
> storing and later using sensitive data to provide the answers to these
> questions of pretty serious consequence—like, what should we spend money
> on, or why we should continue to exist.
I’m sure you’re aware, but there are many interesting historical reasons
> for this sensitivity, and organizations like the American Library
> Association (ALA) and other international library associations have even
> codified the protection of patron privacy into their codes of ethics. For
> example, the ALA's Code of Ethics states: "We protect each library user's
> right to privacy and confidentiality with respect to information sought or
> received and resources consulted, borrowed, acquired or transmitted." While
> I deeply respect and admire this stance, it doesn't provide a solution for
> those of us wrestling with the aforementioned existential questions.
>
In this context, I'd be immensely grateful if you could share your insights
> on the technique of "Pseudonymization" ( https://
> en.wikipedia.org/wiki/Pseudonymization <https://t.co/gVKvpmzoxp>) for PII
> data. Additionally, I'd appreciate a brief review of a Python module I'm
> developing, which aims to assist me (and potentially other library
> professionals) in retaining crucial data for subsequent analysis while
> ensuring data subject privacy. https://gist.github.com/rayvoelker/80c
> 0dfa5cb47e63c7e498bd064d3c0b6 <https://t.co/aAapRKgElr> Thank you once
> again, Steve, for your invaluable contributions to the security community.
> I eagerly await your feedback!
>
I think the even better solution compared to Pseudonymization involves the
Birthday Paradox. It's a direction I hadn't even thought of for this!
--Ray
On Fri, Sep 15, 2023 at 2:43 PM Ray Voelker <[log in to unmask]> wrote:
> Hi code4lib folks .. happy Friday!
>
> I started putting together a little Python utility for doing
> Pseudonymization tasks (https://en.wikipedia.org/wiki/Pseudonymization).
> The goal is to be able to do more analysis on data related to circulation
> while securely maintaining patron privacy.
>
> For a little bit of background I wanted something *like a hash* (but more
> secure than a hash), for replacing select fields related to patron records.
> I also wanted something that could possibly be reversed given an encrypted
> private key that would be stored well outside of the scope of the project.
> I'm thinking that if you wanted to geocode addresses for example, you could
> temporarily decrypt each field needed for the task, use the
> *pseudonymized* patron id as the identifier, and then send your data off
> to the geocoder of your choice. Another example would be to store a
> pseudonymized patron id as the identifier in things like circulation data
> used for later analysis, or for transmitting to trusted 3rd parties who may
> do analysis for you.
>
> I'm humbly asking for anyone with some background in using encryption to
> review the code I have and maybe offer some comments / concerns /
> suggestions / jokes about this.
>
> Thanks in advance!
>
> https://gist.github.com/rayvoelker/80c0dfa5cb47e63c7e498bd064d3c0b6
>
> --
> Ray Voelker
>
--
Ray Voelker
(937) 620-1830
|