image: AI Model Boosts Preterm Birth Prediction Accuracy to nearly 90%
Credit: BGI Genomics
A recent study developed a highly accurate risk prediction framework for preterm birth (PTB) that could broaden the potential of AI-driven multi-omics applications in precision obstetrics and biomedical research.
The model, deeply integrating genomics, transcriptomics, and large language models (LLMs) for the first time for PTB risk prediction, has shown its effectiveness and clinical application prospects.
The research was conducted by a collaborative team led by BGI Genomics, together with Professor Huang Hefeng's team, Shenzhen Longgang Maternal and Child Health Hospital, Fujian Maternity and Child Health Hospital, and OxTium Technology. The research was published in npj Digital Medicine on August 20th.
A Global Challenge
PTB is a leading cause of maternal and neonatal morbidity and mortality worldwide. Each year, around 15 million babies are born prematurely, accounting for roughly 11% of all births worldwide, according to a review study. The earlier a baby is born, the greater the health risks.
Despite extensive research and interventions, its incidence remains high, posing a persistent challenge in modern obstetrics.
Identifying high-risk pregnancies early and accurately is critical. With the rapid advancement of large language models combined with multi-omics data, researchers are now exploring novel pathways for disease risk prediction.
However, predicting PTB remains difficult because its causes are complex and multi-factorial. No single marker has been sufficient to accurately determine risk.
Multi-Omics + AI
This study introduced GeneLLM, a gene-focused large language model designed to interpret complex biological data. By analyzing genetic material circulating in the mother's blood—cell-free DNA (cfDNA) and cell-free RNA (cfRNA)—the researchers built predictive models capable of identifying women at risk of PTB.
This nested case-control study enrolled 682 pregnant women, collecting plasma samples for cfRNA and cfDNA sequencing. Three predictive models were built using different data inputs: cfDNA-only ; cfRNA-only ; Integrated cfDNA + cfRNA.
Using a Transformer-based architecture, all three models achieved high accuracy of over 80% in performance. The cfDNA model achieved a AUC of 0.822, and the cfRNA model achieved a AUC of 0.851. The third model which integrates cfDNA and cfRNA achieved the highest AUC of 89%. A model's AUC closer to 1.0 means it's highly reliable and with high accuracy.
When combining cfDNA + cfRNA, AUC rose to nearly 90%, making it the most powerful approach and indicating that cfDNA and cfRNA capture complementary biological information to improve prediction accuracy.
New Molecular Insight: RNA Editing
Importantly, RNA editing levels were markedly higher in preterm cases, and models based on RNA editing features achieved AUC of 0.82, outperforming single-omics models. These findings suggest a potential mechanistic role of RNA editing in PTB, providing new molecular insights.
Dr. Zhou Si, Chief Scientist at BGI Genomics' IIMR and first author of the study, explained that "Our study shows that integrating cfDNA and cfRNA with LLM outperforms conventional methods in predicting PTB. Importantly, the model is efficient, resource-light, and ready for clinical translation. Beyond prediction, our findings also reveal RNA editing as a promising new target for understanding and regulating PTB."
This research shows the transformative power of AI and multi-omics integration in prenatal medicine. By boosting prediction accuracy to nearly 90%, Multi-omics + LLMs represent a major step toward early identification and intervention for at-risk pregnancies.
This breakthrough demonstrates how artificial intelligence and multi-omics integration can reshape risk prediction in obstetrics, paving the way for earlier interventions and improved maternal and neonatal outcomes.
About BGI Genomics
BGI Genomics, headquartered in Shenzhen, China, is the world's leading integrated solutions provider of precision medicine. Our services cover more than 100 countries and regions, involving more than 2,300 medical institutions. In July 2017, as a subsidiary of BGI Group, BGI Genomics (300676.SZ) was officially listed on the Shenzhen Stock Exchange.
Journal
npj Digital Medicine
Article Title
A novel sequence-based transformer model architecture for integrating multi-omics data in preterm birth risk prediction
Article Publication Date
20-Aug-2025