Topic

De-identification

About: De-identification is a research topic. Over the lifetime, 306 publications have been published within this topic receiving 5676 citations.

...read moreread less

Topic Tools

Find unexplored research gaps

Generate a literature review

Explore related concepts

Papers published on a yearly basis

Papers

Journal Article•10.1186/1472-6947-8-32•

Automated de-identification of free-text medical records

[...]

Ishna Neamatullah¹, Margaret M Douglass¹, Li-wei H. Lehman¹, Andrew T. Reisner¹, Mauricio Villarroel¹, William J. Long¹, Peter Szolovits¹, George B. Moody¹, Roger G. Mark¹, Gari D. Clifford¹ - Show less +6 more•Institutions (1)

Massachusetts Institute of Technology¹

24 Jul 2008-BMC Medical Informatics and Decision Making

TL;DR: In this article, an automated Perl-based de-identification software package is described that is generally usable on most free-text medical records, e.g., nursing notes, discharge summaries, X-ray reports, etc.

...read moreread less

Abstract: Text-based patient medical records are a vital resource in medical research. In order to preserve patient confidentiality, however, the U.S. Health Insurance Portability and Accountability Act (HIPAA) requires that protected health information (PHI) be removed from medical records before they can be disseminated. Manual de-identification of large medical record databases is prohibitively expensive, time-consuming and prone to error, necessitating automatic methods for large-scale, automated de-identification. We describe an automated Perl-based de-identification software package that is generally usable on most free-text medical records, e.g., nursing notes, discharge summaries, X-ray reports, etc. The software uses lexical look-up tables, regular expressions, and simple heuristics to locate both HIPAA PHI, and an extended PHI set that includes doctors' names and years of dates. To develop the de-identification approach, we assembled a gold standard corpus of re-identified nursing notes with real PHI replaced by realistic surrogate information. This corpus consists of 2,434 nursing notes containing 334,000 words and a total of 1,779 instances of PHI taken from 163 randomly selected patient records. This gold standard corpus was used to refine the algorithm and measure its sensitivity. To test the algorithm on data not used in its development, we constructed a second test corpus of 1,836 nursing notes containing 296,400 words. The algorithm's false negative rate was evaluated using this test corpus. Performance evaluation of the de-identification software on the development corpus yielded an overall recall of 0.967, precision value of 0.749, and fallout value of approximately 0.002. On the test corpus, a total of 90 instances of false negatives were found, or 27 per 100,000 word count, with an estimated recall of 0.943. Only one full date and one age over 89 were missed. No patient names were missed in either corpus. We have developed a pattern-matching de-identification system based on dictionary look-ups, regular expressions, and heuristics. Evaluation based on two different sets of nursing notes collected from a U.S. hospital suggests that, in terms of recall, the software out-performs a single human de-identifier (0.81) and performs at least as well as a consensus of two human de-identifiers (0.94). The system is currently tuned to de-identify PHI in nursing notes and discharge summaries but is sufficiently generalized and can be customized to handle text files of any format. Although the accuracy of the algorithm is high, it is probably insufficient to be used to publicly disseminate medical data. The open-source de-identification software and the gold standard re-identified corpus of medical records have therefore been made available to researchers via the PhysioNet website to encourage improvements in the algorithm.

...read moreread less

453 citations

Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule

[...]

Bill Fitzgerald

1 Jan 2015

TL;DR: In this paper, the authors provide guidance about methods and approaches to achieve de-identification in accordance with the Health Insurance Portability and Accountability Act of 1996 (HIPAA) Privacy Rule.

...read moreread less

Abstract: Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule This page provides guidance about methods and approaches to achieve de-identification in accordance with the Health Insurance Portability and Accountability Act of 1996 (HIPAA) Privacy Rule. The guidance explains and answers questions regarding the two methods that can be used to satisfy the Privacy Rule’s de-identification standard: Expert Determination and Safe Harbor . This guidance is intended to assist covered entities to understand what is de-identification, the general process by which de-identified information is created, and the options available for performing de-identification.

...read moreread less

433 citations

Journal Article•10.1093/JAMIA/OCW156•

De-identification of patient notes with recurrent neural networks.

[...]

Franck Dernoncourt¹, Ji Young Lee¹, Özlem Uzuner², Peter Szolovits¹•Institutions (2)

Massachusetts Institute of Technology¹, University at Albany, SUNY²

01 May 2017-Journal of the American Medical Informatics Association

TL;DR: The first de-identification system based on artificial neural networks (ANNs), which requires no handcrafted features or rules, unlike existing systems, is introduced, which outperforms the state-of-the-art systems.

...read moreread less

370 citations

Journal Article•10.1186/1471-2288-10-70•

Automatic de-identification of textual documents in the electronic health record: a review of recent research

[...]

Stéphane M. Meystre¹, F. Jeffrey Friedlin², Brett R. South¹, Shuying Shen¹, Matthew H. Samore¹ - Show less +1 more•Institutions (2)

University of Utah¹, Regenstrief Institute²

02 Aug 2010-BMC Medical Research Methodology

TL;DR: A review of recent research in automated de-identification of narrative text documents from the electronic health record finds methods based on dictionaries performed better with PHI that is rarely mentioned in clinical text, but are more difficult to generalize.

...read moreread less

Abstract: Background: In the United States, the Health Insurance Portability and Accountability Act (HIPAA) protects the confidentiality of patient data and requires the informed consent of the patient and approval of the Internal Review Board to use data for research purposes, but these requirements can be waived if data is de-identified. For clinical data to be considered de-identified, the HIPAA “Safe Harbor” technique requires 18 data elements (called PHI: Protected Health Information) to be removed. The de-identification of narrative text documents is often realized manually, and requires significant resources. Well aware of these issues, several authors have investigated automated de-identification of narrative text documents from the electronic health record, and a review of recent research in this domain is presented here. Methods: This review focuses on recently published research (after 1995), and includes relevant publications from bibliographic queries in PubMed, conference proceedings, the ACM Digital Library, and interesting publications referenced in already included papers. Results: The literature search returned more than 200 publications. The majority focused only on structured data de-identification instead of narrative text, on image de-identification, or described manual de-identification, and were therefore excluded. Finally, 18 publications describing automated text de-identification were selected for detailed analysis of the architecture and methods used, the types of PHI detected and removed, the external resources used, and the types of clinical documents targeted. All text de-identification systems aimed to identify and remove person names, and many included other types of PHI. Most systems used only one or two specific clinical document types, and were mostly based on two different groups of methodologies: pattern matching and machine learning. Many systems combined both approaches for different types of PHI, but the majority relied only on pattern matching, rules, and dictionaries. Conclusions: In general, methods based on dictionaries performed better with PHI that is rarely mentioned in clinical text, but are more difficult to generalize. Methods based on machine learning tend to perform better, especially with PHI that is not mentioned in the dictionaries used. Finally, the issues of anonymization, sufficient performance, and “over-scrubbing” are discussed in this publication.

...read moreread less

353 citations

Proceedings Article•10.1109/CVPRW.2006.125•

Model-Based Face De-Identification

[...]

Ralph Gross¹, Latanya Sweeney¹, F. De la Torre¹, Simon Baker¹•Institutions (1)

Carnegie Mellon University¹

17 Jun 2006

TL;DR: It is shown in extensive experiments that pixelation and blurring offers very poor privacy protection while significantly distorting the data and a novel framework for de-identifying facial images is introduced, which combines a model-based face image parameterization with a formal privacy protection model.

...read moreread less

Abstract: Advances in camera and computing equipment hardware in recent years have made it increasingly simple to capture and store extensive amounts of video data. This, among other things, creates ample opportunities for the sharing of video sequences. In order to protect the privacy of subjects visible in the scene, automated methods to de-identify the images, particularly the face region, are necessary. So far the majority of privacy protection schemes currently used in practice rely on ad-hoc methods such as pixelation or blurring of the face. In this paper we show in extensive experiments that pixelation and blurring offers very poor privacy protection while significantly distorting the data. We then introduce a novel framework for de-identifying facial images. Our algorithm combines a model-based face image parameterization with a formal privacy protection model. In experiments on two large-scale data sets we demonstrate privacy protection and preservation of data utility.

...read moreread less

226 citations

...

Expand

Performance Metrics

306

Papers

1,314

Citations

No. of papers in the topic in previous years
Year	Papers
2021	36
2020	31
2019	34
2018	17
2017	38
2016	31

De-identification

Topic Tools

Papers published on a yearly basis

Papers

Automated de-identification of free-text medical records

Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule

De-identification of patient notes with recurrent neural networks.

Automatic de-identification of textual documents in the electronic health record: a review of recent research

Model-Based Face De-Identification

Related Topics (5)

Performance Metrics