Overview of the data-driven clustering and classification framework for diabetes risk stratification (IMAGE)
Caption
Electronic health record (EHR) data from 51,400 participants, including demographic characteristics, clinical information, anthropometric measures, and blood test indicators, were analyzed. Outcome indicators (diabetes, cardiovascular disease, fatty liver disease, and stroke) were first used to derive pseudo labels through ensemble clustering. These pseudo labels were then integrated with general clinical indicators (e.g., age, sex, BMI, waist circumference, lipids, liver and kidney function markers, blood pressure, and heart rate) to train a weighted naive Bayesian classifier, which categorized individuals into three distinct clusters. Model performance was subsequently verified using an independent validation cohort.
Credit
Dr. Lixin Guo from Beijing Hospital, China and Dr. Tong Jia from Peking University, China
Usage Restrictions
Credit must be given to the creator. Cannot be used without permission
License
Original content