News Release

Machine learning clinical prediction models fail to generalize across trial data

Peer-Reviewed Publication

American Association for the Advancement of Science (AAAS)

Clinical prediction models for schizophrenia treatment outcomes that work well within the trials from which they were developed fail to generalize to future trials, according to a new study. “The findings not only highlight the necessity for more stringent methodological standards for machine learning approaches but also require reexamination of the practical challenges that precision medicine is facing,” writes Frederike Petzschner in a related Perspective. Despite receiving the same treatments for the same afflictions, some patients get better while others show no improvement. Precision medicine approaches seek to address this problem by providing tailored treatments for individual patients. Machine learning models that can mine large and complex data to pinpoint the genetic, socioeconomic, or biological markers that predict the right treatment for an individual are considered promising tools for improving precision medicine and medical outcomes. However, these models are often only validated based on their success in datasets or clinical contexts for which the outcome, such as the response to a given treatment, is already known. Although critical, how these models perform on unforeseen data or independent patient samples isn’t well understood. To address this, Adam Chekroud and colleagues evaluated how well a machine learning model generalized across several independent clinical trials of antipsychotic medication for schizophrenia. While the models predicted patient outcomes with high accuracy within the dataset from which they were developed, their performance fell to no better than chance levels when applied to independent trial data. Even when data from multiple clinical trials was aggregated to train the model, its predictions still failed to generalize to a new, independent clinical trial. The findings suggest that a model’s approximations based on a single dataset provide very limited insight into their future performance. The authors highlight three key reasons as to why this is likely. “The field as a whole… hope that machine learning approaches can eventually improve the allocation of treatments in medicine; however, we should a priori remain skeptical of any predictive model findings that lack an independent sample for validation,” write Chekroud et al.

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.