Dr. Kayaalp asked me to forward his email about the NLM-Scrubber, a tool
for removing PII and health information, to the list.
[log in to unmask]
---------- Forwarded message ---------
From: Kayaalp, Mehmet (NIH/NLM/LHC) [E] <[log in to unmask]>
Date: Mon, Apr 22, 2019 at 10:37 AM
Subject: RE: Looking for lightweight tool to identify PII
To: [log in to unmask] <[log in to unmask]>
A colleague of mine forwarded your email to me.
You may find NLM-Scrubber, https://scrubber.nlm.nih.gov/, helpful to you,
but it is not a turnkey approach for your problem.
Although NLM-Scrubber does not deal anything but ASCII format, you may not
be able to find a better freeware anywhere to de-identify your documents.
There should be a number of tools to convert PDF to ASCII format. If you
are willing to work on that prerequisite on your own, I would be happy to
help you solve your de-identification problem.
* Mehmet Kayaalp, M.D., Ph.D. *Lister Hill National Center for Biomedical
National Institutes of Health
8600 Rockville Pike
Bethesda, MD 20894-3828
*[log in to unmask]
Date: Fri, 19 Apr 2019 13:26:22 -0400
From: Kimberly Kennedy <[log in to unmask]>
Subject: Looking for lightweight tool to identify PII
We are beginning a digitization project at my institution that involves
scanning archival documents that may contain personal identifying
information, such as social security numbers or credit card numbers. I'm
looking for a tool that will examine the PDFs and identify the ones that
may contain PII, so we can then redact them.
I've experimented a bit with Bulk Extractor Viewer but haven't been able to
get it to work on the scanned PDFs I've created. I talked to a sales rep
at Spirion and that program seems like overkill for our purposes. Any
suggestions for other things to try would be appreciated!
Digital Production Coordinator
Northeastern University Library
[log in to unmask]