The era of functional genomics has enabled scientists to analyze massive amounts of data on cellular activity in disease and health. The more these data are shared between labs, the greater the power scientists have for finding genes linked to disease.
This widespread sharing of functional genomics data, however, creates a conundrum as it also makes the genetic privacy of individuals harder to protect.
In a new report, a team of Yale scientists has developed a way to protect people's private genetic information while preserving the benefits of a free exchange of functional genomics data between researchers.
The report, published Nov. 12 in the journal Cell, was led by senior author Mark Gerstein, the Albert L Williams Professor of Biomedical Informatics and professor of molecular biophysics and biochemistry, of computer science, and of statistics and data science, and first author Gamze Gursoy, a postdoctoral researcher in Gerstein's lab.
"Genetic information is the most fundamental information of all," Gerstein said. "If somebody gets access to your financial information, you can still get a new credit card. But once a genome is in a database, you are stuck -- and so are your children and grandchildren."
The widespread use of genetic testing by services such as Ancestry.com has already allowed individuals to identify relatives they had not known about. However, the huge genetic databases collected by scientists potentially can also be used for less benign uses.
For instance, a person with malicious intent and possession of DNA taken from a coffee cup could in theory identify a person who has HIV if that person had previously participated in a study about AIDs. In addition to the potential threat of blackmail, life insurance companies could refuse coverage for that individual. Similar risks exist for others who are, say, at high risk of developing cancer.
The privacy risk spans generations. Since individual genomic data are never erased, the grandson of a man with schizophrenia might one day face discrimination because of his inherited genetic predisposition for developing the disease.
There are societal risks as well. For instance, hostile foreign governments could hack databases looking for potentially damaging genetic information about U.S. citizens. Or authoritarian governments could use some data, as in so-called eugenics programs, to identify and harm individuals with "undesirable traits."
"Genetics has a problematic history," Gerstein says.
To overcome these privacy threats, Gursoy and Gerstein developed a method to quantify how much data from studies might be "leaking" -- or contain information that identifies individuals in the study. They then were able to "sanitize" or block access to small amounts of individually identifiable genetic information, while preserving the great majority of data for use by researchers.
"We can protect individual privacy while still encouraging people to participate in genetic studies that are undeniably good for society," Gerstein said.