Newly developed software will help to allay patients' fears about who has access to their confidential data. Research published today in the open access journal BMC Medical Informatics and Decision Making describes a computer program capable of deleting details from medical records which may identify patients, while leaving important medical information intact.
Patient records that are to be shared within the research community must have any identifying information removed. Manual removal of identifying information is prohibitively expensive and time consuming. Considerable research by many investigators has focussed on developing automated techniques for "de-identifying" medical records. A team from the Massachusetts Institute of Technology (MIT) funded by the National Institutes of Health (NIH) aimed to solve this problem, pointing out that: "Text-based patient medical records are a vital resource in research. The expense of manual de-identification, coupled with the fact that it is time-consuming and prone to error, necessitates automatic methods for large-scale de-identification."
The MIT team tested their censoring software on a meticulously hand-annotated database of 1836 nursing notes (a total of 296,400 words). According to the authors, "The software successfully deleted more than 94% of the confidential information, while wrongly deleting only 0.2% of the useful content. This is significantly better than one expert working alone, at least as good as two trained medical professionals checking each other's work and many, many times faster than either."
The MIT team is also providing access to the fully-scrubbed annotated data together with the software to allow others to improve their systems, and to allow the software to be adapted to other data types that may exhibit different qualities.
Notes to Editors
1. Automated De-Identification of Free-Text Medical Records
Ishna Neamatullah, Margaret M Douglass, Li-wei H Lehman, Andrew Reisner, Mauricio Villarroel, William J Long, Peter Szolovits, George B Moody, Roger G Mark and Gari D Clifford
BMC Medical Informatics and Decision Making (in press)
During embargo, article available here: http://www.
After the embargo, article available at journal website: http://www.
Please name the journal in any story you write. If you are writing for the web, please link to the article. All articles are available free of charge, according to BioMed Central's open access policy.
Article citation and URL available on request at firstname.lastname@example.org on the day of publication.
2. The National Institute of Biomedical Imaging and Bioengineering, one of the National Institutes of Health, funded this project.
3. BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in information management, systems and technology in healthcare and the study of medical decision making. BMC Medical Informatics and Decision Making (ISSN 1472-6947) is indexed/tracked/covered by PubMed, MEDLINE, CAS, Scopus, EMBASE, Thomson Scientific (ISI) and Google Scholar.
4. BioMed Central (http://www.