I’m curious if you can share more information about the storage, access, and intended use of the datasets. Depending on the answers, you might look at some of the following:
- creating two datasets: one with PHI scrubbed for wider distribution, and one with PHI intact that is restricted to authorized users
- if the use case includes tracking individuals throughout a length of time, then you might need to look at implementing some pseudonymous data methods
- scrubbing some PHI and de-identifying other PHI data, depending on use cases for the data set
More options will follow with more information :c) Encrypting the data does not anonymize it, but it does control access to it, so if you must keep the raw metadata, then access controls would be a key part in handling PHI in your dataset.
Since you are dealing with PHI, you probably already are aware to keep within the rules and regulations set by HIPPA and HITECH. One more thing to keep in mind is that any state laws that go above and beyond HIPPA apply - see the California Medical Information Privacy Act as an example of such a state law. If you have access to a legal department or something similar, they would be better able to guide you in these matters.
Sent from somewhere.
> On May 11, 2018, at 4:17 PM, Kyle Banerjee <[log in to unmask]> wrote:
> Howdy all,
> We need to share large datasets containing medical imagery without
> revealing PHI. The images themselves don't present a problem due to their
> nature but the embedded metadata does.
> What approaches might work ?
> Our first reaction was to encrypt problematic fields, embed a public key
> for each item in the metadata, and have that dataset owner hold a separate
> private key for each image that allows authorized users to decrypt fields.
> Keys would be transmitted via the same secure channels that would normally
> be used for for authorized PHI.
> There's an obvious key management problem (any ideas for this -- central
> store would counteract the benefits the keys offer), but I'm not sure if we
> really have to worry about that. Significant key loss would be expected but
> since that data disseminated is only a copy, a new dataset with new keys
> could be created from the original if keys were lost or known to be
> This approach has a number of flaws, but we're thinking it may be a
> practical way to achieve the effect needed without compromising private
> Any ideas would be appreciated. Thanks,