ATLANTA--Georgia State University faculty have received a five-year, $3 million federal grant to further develop a tool that will allow researchers around the world to participate in extensive brain imaging analysis without sharing protected patient data.
"When you have many players in a research consortium, and each player has their own dataset, they don't always trust one another, or there may be regulatory hurdles due to privacy concerns or international policies" said Sergey Plis, associate professor of computer science and co-principal investigator on the grant. "To pull together researchers on large-scale projects, we have to find a way to bypass these barriers to sharing information."
Experts have many unanswered questions about how the complex and dynamic human brain is affected by mental health and neurodegenerative disorders. To get clinically useful answers that can be applied to a large population of subjects, scientists use machine learning models that analyze thousands of individual brain scans, searching for patterns. However, these massive datasets are typically stitched together from smaller datasets provided by hospitals, healthcare systems and other institutions, and sharing data while also protecting subject privacy and anonymity can be a huge hurdle.
Vince Calhoun, co-principal investigator on the grant and Distinguished University Professor of Psychology, and his research team have developed a software tool called COINSTAC, which allows data analyses to be performed locally at participating sites, combining and sharing only the results. Using the grant from the National Institute on Drug Abuse, Calhoun and Plis plan to adapt COINSTAC to make it compatible with deep learning models so they can be trained on multiple decentralized databases.
"COINSTAC lets us avoid the step of actually aggregating private data, while retaining the strength of large-scale analyses. We're bringing the computation to the data, not the other way around," said Calhoun, who is also director of the Center for Translational Research in Neuroimaging and Data Science (TReNDS), and a Georgia Research Alliance Eminent Scholar. "Now, because of the demand for deep learning, we need to enhance the security to allow for the sharing of a greater volume of data without compromising privacy."
Calhoun and his team began developing COINSTAC in 2014, predating a similar analysis method popularized by Google and known as "federated learning." One way to ensure privacy in federated learning is through a system known as differential privacy, which adds noise to the data to obscure information that might reveal sensitive details about the participants. Another way is to use "bank-level" encryption standards to prevent the data from being easily accessed or deciphered. Both systems will be used in COINSTAC.
"Today, data such as MRI scans is not easily interpretable, but in the future, it's possible someone could figure out how to use it as a kind of fingerprint, so you want to protect it," said Plis, director of machine learning core at TReNDS. "Therefore, anonymizing the scans isn't enough. Differential privacy allows researchers to extract useful information from the algorithms without revealing private data about any individual."
Using the COINSTAC platform, participating institutions in a research project will be able to set their own privacy "budgets," depending on the sensitivity of the data.
"Perhaps the subjects in one dataset have a very rare disease, meaning the privacy needs to be tight," said Plis. "The institution can say, 'once this person's data is used in a small number of computations, I want to take it offline.' Other datasets may have a larger privacy budget."
The team hopes enhanced functionality will pave the way for widespread use of COINSTAC among the larger neuroscience community. It plans to test it through a partnership with the Enhancing Neuro Imaging Genetics Through Meta-Analysis (ENIGMA) Addiction Group, a collaboration of addiction researchers from around the world. The group is unable to perform deep learning analyses on data that cannot be centrally located.
Using COINSTAC's new functions, Calhoun and Plis will evaluate the brain impact of six classes of substances (methamphetamines, cocaine, cannabis, nicotine, opiates and alcohol and their combinations). The final goal is to create an easy-to-use, scalable platform that enables greater data sharing through its decentralized approach to deep learning analyses.