On Tue, May 15, 2018 at 6:01 PM, John Pellman <[log in to unmask]>

> Disclaimer: Some of this is probably going to be redundant with respect to
> what Becky K has already said.
> Are you using DICOMs?  If so, it's pretty straightforward to anonymize

Yes (or at least a lot of it is). I'm still waiting for samples and more
information about them. A major use case we need to deal with involves the
one Becky asked about regarding tracking individuals through time.

In all honesty, I don't think we'll be able to do what they really want in
some cases -- namely deidentification which maintains the full utility of
the data. As you observe, anonymization is easy. However, methods that
prevent reidentification can render it useless. This project is about
maximizing utility while maintaining privacy in a legally compliant manner.

> I've never encountered encrypting PHI fields
> <> before and have
> never experienced it with datasets I've worked with in the past when I was
> in science.  It seems like a good idea if you absolutely need to keep the
> PHI in the headers for whatever reason, but in terms of legality,
> person-hours, stress and parsimony you're probably better off just keeping
> a separate anonymized dataset and sharing that with outsiders.

Our researchers will ship a hard drive if an authorized user needs an
entire dataset -- this stuff is too big to download. However, we're looking
for a much more granular level of control that makes it available for a
wider array of purposes with downloadable subsets. The link is helpful. I'm
totally new to this sort of problem and am flying blind.

> Another specific question, but are you working with neuroimaging data in
> particular?  If so, you might want to consider BIDS
> <>.  Since BIDS is NifTI-based, tons of
> headers
> (including those with PHI) are discarded, and the DICOM tags that
> scientists actually care about (which are typically non-identifying) are
> saved in sidecar JSON files that are paired with NifTI images.  Basically
> it's a standard explicitly geared towards promoting data sharing.  You'll
> also definitely want to make sure you deface
> <> any images before
> distribution to non-privileged researchers (assuming that you are working
> with brain images) since faces are considered to be PHI.

Some datasets contain neuroimaging data and we've been discussing some of
these exact issues. We plan to start with easier stuff --  I'll check BIDS
and the defacing methods out.  When we're ready for images that are by
themselves PHI, we'll use monkey data for preliminary experiments since
they aren't subject to HIPAA