Hi everyone,
Dr. Kayaalp asked me to forward his email about the NLM-Scrubber, a tool
for removing PII and health information, to the list.
Thanks!
Kim
Kimberly Kennedy
[log in to unmask]
---------- Forwarded message ---------
From: Kayaalp, Mehmet (NIH/NLM/LHC) [E] <[log in to unmask]>
Date: Mon, Apr 22, 2019 at 10:37 AM
Subject: RE: Looking for lightweight tool to identify PII
To: [log in to unmask] <[log in to unmask]>
Hi Kimberly,
A colleague of mine forwarded your email to me.
You may find NLM-Scrubber, https://scrubber.nlm.nih.gov/, helpful to you,
but it is not a turnkey approach for your problem.
Although NLM-Scrubber does not deal anything but ASCII format, you may not
be able to find a better freeware anywhere to de-identify your documents.
There should be a number of tools to convert PDF to ASCII format. If you
are willing to work on that prerequisite on your own, I would be happy to
help you solve your de-identification problem.
Best,
--mehmet
* Mehmet Kayaalp, M.D., Ph.D. *Lister Hill National Center for Biomedical
Communications
Building 38A
National Institutes of Health
8600 Rockville Pike
Bethesda, MD 20894-3828
*[log in to unmask]
<https://mail.nih.gov/owa/redir.aspx?C=537a86ef78834449801de20bf1550246&URL=mailto%3aMehmet.Kayaalp%40nih.gov>*
Date: Fri, 19 Apr 2019 13:26:22 -0400
From: Kimberly Kennedy <[log in to unmask]>
Subject: Looking for lightweight tool to identify PII
Hello!
We are beginning a digitization project at my institution that involves
scanning archival documents that may contain personal identifying
information, such as social security numbers or credit card numbers. I'm
looking for a tool that will examine the PDFs and identify the ones that
may contain PII, so we can then redact them.
I've experimented a bit with Bulk Extractor Viewer but haven't been able to
get it to work on the scanned PDFs I've created. I talked to a sales rep
at Spirion and that program seems like overkill for our purposes. Any
suggestions for other things to try would be appreciated!
Thanks,
Kim
Kimberly Kennedy
Digital Production Coordinator
Northeastern University Library
[log in to unmask]
|