Demistifying deidentification of phi in free-formatted text

DeMISTifying Deidentification
of PHI in Free-formatted Text

March 2016
Approved for Public Release; Distribution Unlimited. Case Number 16-0670 2016 The MITRE Corporation. All rights reserved.
Introduction
Tool Rationale
MITRE Identification Scrubber Toolkit (MIST)
Use Case 1 – Deidentification
Hiding in Plain Sight
Use Case 2 – Identification of PHI in e-mail
Privacy Risk Identification and Management
2016 The MITRE Corporation. All rights reserved. For Internal MITRE Use.
2016 The MITRE Corporation. All rights reserved.


We are a (not-for-profit) public interest
company
, working with industry and academia
to advance and apply science, technology,
systems engineering, and strategy, enabling
government and the private sector to make
better decisions and implement (publicly
available) solutions to complex challenges of
national and global significance
.
…including in the areas of natural language
processing, privacy, and cybersecurity.

2016 The MITRE Corporation. All rights reserved. For Internal MITRE Use.
2016 The MITRE Corporation. All rights reserved. Open Source MITRE Identification Scrubber Toolkit (MIST)
2016 The MITRE Corporation. All rights reserved. For Internal MITRE Use.
2016 The MITRE Corporation. All rights reserved.


Use Case #1: Research
Doctor's Notes: research involving
– Treasure trove of information– How to disclose free-formatted text to external researchers Take advantage of linguistics experts Mitigate hurtles/risk of sharing PHI – How to mine while respecting patient Solution: De-identify
Protected Health Information (PHI) Identifiers 2016 The MITRE Corporation. All rights reserved. For Internal MITRE Use.
2016 The MITRE Corporation. All rights reserved. PHI in Free-Formatted Text: De-Identification Challenge
Start with known PHI object, locate PHI elements, and de-identify
HISTORY OF PRESENT ILLNESS: The patient is a 77-year-old-woman with long standing hypertension who presented as a walk-in to me atthe Sun Hill Medical Center on August 12th. Recently had been started q.o.d. on Clonidine since June 8th to taper off of the drug. Was told to start Zestril 20 mg. q.d. again. Patient sent to Jones Cardiac Unit for direct admission for cardioversion and anticoagulation, with the Cardiologist, Dr. Pearson to follow.
Sample PHI Object
• Doctors' notes • Discharge summaries 2016 The MITRE Corporation. All rights reserved. For Internal MITRE Use.
2016 The MITRE Corporation. All rights reserved. Consumer Off the Shelf (COTS)
de-identification tools

 While some are standalone, many are components of larger, expensive data and network management tools  For unstructured data, identification tends to rely on ‘brute force' keywords and regular expressions – Lists of names – Hand-crafted patterns requiring skilled developers There is no solution that is 100% full proof including manual de-identification. 2016 The MITRE Corporation. All rights reserved. For Internal MITRE Use.
2016 The MITRE Corporation. All rights reserved. MIST: Training a De-identification System
documents
using model
train (better)
train model
model from
from initial
redact or
mark PHI by
hand in initial
2016 The MITRE Corporation. All rights reserved. For Internal MITRE Use.
2016 The MITRE Corporation. All rights reserved. MIST: Training a De-identification System
HISTORY OF PRESENT ILLNESS: The patient is a 77-year-old-woman
with long standing hypertension who presented as a walk-in to me atthe YY] Recently had been
started q.o.d. on Clonidine since Z] to taper off of the drug.
Was told to start Zestril 20 mg. q.d. again. The patient was sent tothe GG] or direct admission for cardioversion and
anticoagulation, with the Cardiologist, Dr. ] o follow.
Transform the
2016 The MITRE Corporation. All rights reserved. For Internal MITRE Use.
2016 The MITRE Corporation. All rights reserved.  Study
– 8 hours of marking (training) data – Narrative from patient records – 95% accurate as a measure of precision (false positives) and recall (false negatives) (Favorably) comparable to manual reviews  The MIST results
– Top score in the first i2b2 De-identification Challenge Evaluation – Used to de-identify medical records by hospitals – Has led to numerous collaborations on MITRE projects – Rapidly portable – adaptable to other domains – The MIST TALLAL approach works well for a large corpus – Precision/recall tradeoff can be adjusted 2016 The MITRE Corporation. All rights reserved. For Internal MITRE Use.
2016 The MITRE Corporation. All rights reserved. How Good is Manual De-id by Humans?
Counts of overlooked PHI
– In 100 Family Practice notes – Containing 1,093 PHI  Pairs of reviewers  Trios of reviewers MIST (Single Model) From: Is the Juice Worth the Squeeze? Costs and Benefits of Multiple Human Annotators for Clinical Text De-identification. D. S. Carrell; D. J. Cronkite; B. A. Malin; J. S. Aberdeen; L. Hirschman, submitted to Meth of Info in Medicine 2016 The MITRE Corporation. All rights reserved. For Internal MITRE Use.
2016 The MITRE Corporation. All rights reserved. DeMISTifying Deidentification On Steroids…
Hiding in Plain Sight (HIPS)* Research

HISTORY OF PRESENT ILLNESS: The patient is a 77-year-old-woman with long standing hypertension who presented as a walk-in to me atthe Sun Hill Medical Center on August 12th. Recently had been started q.o.d. on Clonidine since June 8th to taper off of the drug. Was told to start Zestril 20 mg. q.d. again. Patient sent to Jones Cardiac Unit for direct admission for cardioversion and anticoagulation, with the Cardiologist, Dr. Pearson to follow.
Hypothesis: With 'good'
Initial research results:
resynthesis, it can be nearly Good resynthesis reduced with larger sample impossible to detect ‘leaked the detection of PHI leaks sizes is necessary *HIPS is a collaborative PHI' manually OR using by at worse case an to validate.
research effort among data mining hackers.
additional 85%.
GroupHealth, Vanderbilt, and MITRE 2016 The MITRE Corporation. All rights reserved. For Internal MITRE Use.
2016 The MITRE Corporation. All rights reserved.



Use Case #2 - Identifying PHI in Free-Formatted Text
"Dr. Famous and his Laptop"(1) - Loss of control over
unencrypted laptop which contains e-mail that may have
protected health information (PHI)
 e-mail back-up is available Need to establish extent of PHI content in email for Health Information Portability and Accountability Act (HIPAA)  Solution: Intensive manual review ($$$$$$)
Research Seedling: Is there a possibility MIST can
facilitate PHI discovery in e-mail?
1. Tagging and Modeling: Can MIST successfully ‘tag' PHI identifiers in e-mail and be trained to model e-mail tagging? 2. If Step 1 is successful, identify prospective next steps for re-purposing MIST as an identification tool for locating PHI in email (1) Halamka, Dr. John D., Surviving the Cybersecurity Cold War: A CIO's Practical Guide for Risk Management, slide 1 2016 The MITRE Corporation. All rights reserved. For Internal MITRE Use.
2016 The MITRE Corporation. All rights reserved.



Problem Scope – Compromised Assets
2012 Ponemon Institute
Study(2) on Provider Breaches:
of healthcare organizations had at least one breach in the prior 2 years of organizations permit employees and medical staff to had more than 5 breaches in use their own personal mobile the prior 2 years devices to connect to their covered entity providers networks or enterprise systems of breaches caused by lost or  Enterprise data-at-rest stolen computing device encryption solutions offer partial risk mitigation (1) Department of Health and Human Services, 45 CFR Parts 160 and 164, Federal Register, Vol. 78, No. 17, Part II, January 25, 2013 (2) Ponemon Institute Research Report, Third Annual Benchmark Study on Patient Privacy and Data Security, Dec 2012 2016 The MITRE Corporation. All rights reserved. For Internal MITRE Use.
2016 The MITRE Corporation. All rights reserved.  Identifying other types of sensitive data of in compromised
– Sensitive personal identifiable information (PII) such as financial information (as per Gramm-Leach-Bliley) – PII defined in state consumer protection and/or breach notification – Proprietary data, Sensitive but Unclassified (SBU), etc.
Loss of control over a ‘device' with potentially multiple sources
of sensitive unstructured data;
– Risk assessment requires reconstructing the contents of the device  Backups Information on application servers – E-mail, SharePoint, Database, Etc.
2016 The MITRE Corporation. All rights reserved. For Internal MITRE Use.
2016 The MITRE Corporation. All rights reserved. The Challenge of PHI Discovery
Protected Health Information is:
– Created or received by a covered entity AND – Identifies an individual (or is identifiable) (i.e., contains PHI identifiers) – Relates to the individual's past, present, or future physical or mental health, the provisioning of health for the individual, or payment  HIPAA de-identification is governed by a safe-harbor standard
– 17 fields plus a catch-all  PHI identifiers In e-mail with MIST
E-mail is different from the narrative, clinical note domain
– Structure, context mix of text and control characters 2016 The MITRE Corporation. All rights reserved. For Internal MITRE Use.
2016 The MITRE Corporation. All rights reserved. MIST E-mail PHI Identification Results
Based on one run
– Precision: Fraction of identified terms that were relevant = 0.733– Recall: Fraction of relevant terms that were identified = 0.738– F-measure: 2 × ((precision × recall) / (precision + recall)) = 0.736  Additional modeling
– Precision: 0.899
– Recall: 0.870
– F-measure: 2 × ((precision × recall) / (precision + recall)) = 0.885
Given (many) constraints, MIST performance was (very)
encouraging for e-mail processing
– Opportunity for significant improvement with an enhanced tag set and email-aware blocking software  Additional hours/dollars needed for product-ready email solution
2016 The MITRE Corporation. All rights reserved. For Internal MITRE Use.
2016 The MITRE Corporation. All rights reserved. Privacy Risk Management Automated Tool
Automated tools to support data-associated risk management
– PHI de-identification of doctor's notes – Hiding in Plain Sight Resynthesis of believable fake data – Tailorable (TALLAL) tool at the ready for assessing leakage of {PHI, PII, proprietary information, sensitive information,…} in free-formatted text (e.g., email) How about a tailorable automated tool that supports ? privacy risk management?
2016 The MITRE Corporation. All rights reserved. For Internal MITRE Use.
2016 The MITRE Corporation. All rights reserved. What is MITRE's Privacy Risk Identification and
Management Engine ((P)RIME*)?

RIME is a foundation for organizations to normalize and manage risk
– ‘Organization' defines RIME-hosted instance – Web front-end with database backend – RIME provides engines for:  Risk Managers  PII Owners (Business Application, Program, System)  Dynamic question-naires  Automated compliance  Cursor sensitive, as-needed document generation  Risk management (raw vs.  Immediate risk feedback * (P)RIME - Initially designed with Privacy as the risk focus 2016 The MITRE Corporation. All rights reserved. For Internal MITRE Use.
2016 The MITRE Corporation. All rights reserved. Risk Identification and Management Engine
User- facing artifacts
 Context-dependent  Dynamic Questionnaire  Organizationally-defined  Cursor-sensitive help  Risk Analysis  Dashboard Generation  Privacy Impact  Risk identification, priority  Risk Management Assessments (PIA's)  Document Creation  Privacy Threshold  (Compliance) document Assessment (PTA's)  "Data-Rich" Completed 2016 The MITRE Corporation. All rights reserved. For Internal MITRE Use.
2016 The MITRE Corporation. All rights reserved. [P]RIME ‘Big' Ideas
Move away from Word, etc.,
Push privacy SME-ness
documents as risk ‘tool'
to PII owners
 Covert, timely awareness by making privacy risk more explicit Separate data gathering
2 from risk analysis and
[compliance or other]
document generation

Empower risk managers
Eliminate redundancies (and wasted time), reduce with risk metrics and tools
 Inject discipline and consistency 2016 The MITRE Corporation. All rights reserved. For Internal MITRE Use.
2016 The MITRE Corporation. All rights reserved. P[RIME] Demo
Functionality is real
Production Use
Ready for transfer to sponsor and to industry
– For optimal results, MITRE should assist with initial instantiation  Instantiation Examples
– Traditional PIA, breach, inventory support – ‘Down-select' from a complete set of possible risk issues – Automated requirements/testing support 2016 The MITRE Corporation. All rights reserved. For Internal MITRE Use.
2016 The MITRE Corporation. All rights reserved. MIST, HIPS, PRIME, and other Privacy or
Cybersecurity Tools

For more information, please contact Cathy Petrozzino at
2016 The MITRE Corporation. All rights reserved. For Internal MITRE Use.
2016 The MITRE Corporation. All rights reserved.

Source: http://www2.mitre.org/work/health/himss/pdf/HIMSS16-FH27-Petrozzino.pdf

quali-smart.qrsite.co

Corporate Information Biographical Details of Directors and Senior Management Chairman's Statement Corporate Governance Report Management Discussion and Analysis Directors' Report Independent Auditor's Report Consolidated Financial Statements Consolidated Income Statement Consolidated Statement of Comprehensive Income Consolidated Statement of Financial Position

Maximizing the value of license agreements

Value of License Agreements Maximizing The Value Of License AgreementsBy Louis P. Berneman, Todd C. Davis, D. Patrick O'Reilley and Matthew Raymond agreements appropriate ■ Louis P. Berneman, iopharmaceutical companies and not-for-profit to their commercial po- (academic) research institutions have become tential and inherent risks increasingly adept at structuring license and

Copyright © 2008-2016 No Medical Care