Print

Print


BTW, here is the transcript:

https://www.grc.com/sn/sn-940-notes.pdf

Page 12 is where Ray's project discussion starts.


On Friday, September 22, 2023 at 09:43, Ray Voelker eloquently inscribed:

> Hi code4lib folks .. and again ... happy Friday!!
> 
> I just wanted to post an update to this. I wrote in to the Security Now!
> podcast (fantastic show by the way and fully worth listening to on a
> regular basis) about this notion, and it was made the main topic of show
> number 940!
> 
> https://twit.tv/shows/security-now/episodes/940?autostart=false
> 
> The discussion starts around the 1:36 mark.
> 
> Here's what I wrote to Steve Gibson:
> 
> In addition to being an avid listener to Security Now, I'm also a System
>> Administrator for a large public library system in Ohio. Libraries
>> often struggle with data—being especially sensitive around data related
>> to patrons and patron behavior in terms of borrowing, library program
>> attendance, reference questions, etc. The common practice is for
>> libraries to aggregate and then promptly destroy this data within a
>> short time frame—which is typically one month. However, administrators
>> and local government officials, who are often instrumental in
>> allocating library funding and guiding operational strategies,
>> frequently ask questions on a larger time scale than one month to
>> validate the library's significance and its operational strategies.
>> Disaggregation of this data to answer these types of questions is very
>> difficult and arguably impossible. This puts people like me, and many
>> others like me, in a tough spot in terms of storing and later using
>> sensitive data to provide the answers to these questions of pretty
>> serious consequence—like, what should we spend money on, or why we
>> should continue to exist.
> 
> I’m sure you’re aware, but there are many interesting historical reasons
>> for this sensitivity, and organizations like the American Library
>> Association (ALA) and other international library associations have
>> even codified the protection of patron privacy into their codes of
>> ethics. For example, the ALA's Code of Ethics states: "We protect each
>> library user's right to privacy and confidentiality with respect to
>> information sought or received and resources consulted, borrowed,
>> acquired or transmitted." While I deeply respect and admire this
>> stance, it doesn't provide a solution for those of us wrestling with
>> the aforementioned existential questions.
>> 
> 
> In this context, I'd be immensely grateful if you could share your insights
>> on the technique of "Pseudonymization" ( https://
>> en.wikipedia.org/wiki/Pseudonymization <https://t.co/gVKvpmzoxp>) for
>> PII data. Additionally, I'd appreciate a brief review of a Python
>> module I'm developing, which aims to assist me (and potentially other
>> library professionals) in retaining crucial data for subsequent
>> analysis while ensuring data subject privacy.
>> https://gist.github.com/rayvoelker/80c 0dfa5cb47e63c7e498bd064d3c0b6
>> <https://t.co/aAapRKgElr> Thank you once again, Steve, for your
>> invaluable contributions to the security community. I eagerly await
>> your feedback!
>> 
>> 
>  I think the even better solution compared to Pseudonymization involves the
> Birthday Paradox. It's a direction I hadn't even thought of for this!
> 
> --Ray
> 
> On Fri, Sep 15, 2023 at 2:43 PM Ray Voelker <[log in to unmask]> wrote:
> 
>> Hi code4lib folks .. happy Friday!
>> 
>> I started putting together a little Python utility for doing
>> Pseudonymization tasks
>> (https://en.wikipedia.org/wiki/Pseudonymization). The goal is to be
>> able to do more analysis on data related to circulation while securely
>> maintaining patron privacy.
>> 
>> For a little bit of background I wanted something *like a hash* (but
>> more secure than a hash), for replacing select fields related to patron
>> records. I also wanted something that could possibly be reversed given
>> an encrypted private key that would be stored well outside of the scope
>> of the project. I'm thinking that if you wanted to geocode addresses
>> for example, you could temporarily decrypt each field needed for the
>> task, use the *pseudonymized* patron id as the identifier, and then
>> send your data off to the geocoder of your choice. Another example
>> would be to store a pseudonymized patron id as the identifier in things
>> like circulation data used for later analysis, or for transmitting to
>> trusted 3rd parties who may do analysis for you.
>> 
>> I'm humbly asking for anyone with some background in using encryption to
>> review the code I have and maybe offer some comments / concerns /
>> suggestions / jokes about this.
>> 
>> Thanks in advance!
>> 
>> https://gist.github.com/rayvoelker/80c0dfa5cb47e63c7e498bd064d3c0b6
>> 
>> --
>> Ray Voelker
>> 
> 
>