Chronic kidney disease (CKD) is a complex condition marked by a gradual decline in kidney function, which can ultimately progress to end-stage renal disease (ESRD). Globally, the prevalence of the CKD ranges from 8% to 16%, with about 5% to 10% of those diagnosed eventually reaching ESRD, making it a major public health challenge.
In a new study, researchers used machine learning and deep learning models, as well as explainable artificial intelligence (AI), to assess integrated clinical and claims data with the goal of improving prediction of CKD’s progression to ESRD. The integrated models outperformed single data source models, which can enhance CKD management, support targeted interventions, and reduce health-care disparities.
The study, by researchers at Carnegie Mellon University, appears in the Journal of the American Medical Informatics Association.
“Our study presents a robust framework for predicting ESRD outcomes, improving clinical decision-making through integrated multisourced data and advanced analytics,” explains Rema Padman, professor of management science and healthcare informatics at Carnegie Mellon’s Heinz College, who led the study. “Future research will expand data integration and extend this framework to other chronic diseases.”
The progression of CKD is classified into five stages, culminating in ESRD, when kidney function drops to 10% to 15% of normal capacity, necessitating dialysis or transplantation for patient survival. The economic impact of CKD is significant, with a relatively small proportion of U.S. Medicare CKD patients contributing to a disproportionately high share of Medicare expenses, especially when they progress to ESRD. In addition, more than a third of ESRD patients are readmitted within 30 days of discharge, underscoring the critical need for early detection and management of the disease to prevent its progression to ESRD, improve patient health outcomes, and reduce health-care costs.
In this study, researchers used data from more than 10,000 CKD patients, combining clinical and claims information from 2009 to 2018. They evaluated multiple statistical, machine learning, and deep learning models using five distinct observation windows. Their work was supported by explainable AI to enhance interpretability and reduce bias.
The study’s integrated data models outperformed single data source models. A 24-month observation window optimally balanced early detection and prediction accuracy. The 2021 estimated glomerular filtration rate equation improved prediction accuracy and reduced racial bias, particularly for African American patients.
“Our work bridges a critical gap by developing a framework that uses integrated clinical and claims data rather than isolated data sources,” notes Yubo Li, a PhD student at Carnegie Mellon’s Heinz College, who coauthored the study. “By minimizing the observation window needed for accurate predictions, our approach balances clinical relevance with patient-centered practicality; this integration enhances both predictive accuracy and clinical utility, enabling more informed decision-making to improve patient outcomes.”
Among the study’s limitations, the authors say their reliance on data from one institution may limit the generalizability of their model to other care settings. In addition, their use of data from electronic health records can introduce observational bias, incomplete records, and underrepresentation of certain patient groups, which can undermine both accuracy and fairness.
Article Title
Enhancing end-stage renal disease outcome prediction: a multisourced data-driven approach
Article Publication Date
6-Aug-2025