News Release

Colorectal cancer survival predicted by AI using clinical and molecular features

“The clinical and biological features proposed here in conjunction with ML can improve the interpretation of CRC mechanisms and predict patient survival”

Peer-Reviewed Publication

Impact Journals LLC

Machine learning-based survival prediction in colorectal cancer combining clinical and biological features

image: 

Figure 1: LASSO feature ranking and SHAP explanatory for Cases 1, 2, and 3 feature selection models.
A positive SHAP value indicates a positive impact on prediction, leading the model to predict 1 (Patient survival). A negative value indicates an adverse effect, leading the model to predict 0 (Patient non-survival). The color of the SHAP data points shows the values as a heatmap where blue is the lowest value (e.g., 0) and red is the highest value (e.g., 1). For Cases 1 and 2, pathological stage and E2F8 expression are the most relevant clinical and biological features respectively. On the other hand, for group 3, pathological stage and hsa-miR-495-3p expression are the most relevant features.

view more 

Credit: Copyright: © 2025 Vieira et al. This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

“The clinical and biological features proposed here in conjunction with ML can improve the interpretation of CRC mechanisms and predict patient survival.”

BUFFALO, NY – December 17, 2025 – A new research paper was published in Oncotarget (Volume 16) on December 15, 2025, titled “Machine learning-based survival prediction in colorectal cancer combining clinical and biological features.”

In this study, led by Lucas M. Vieira from the University of Brasília and the University of California San Diego, researchers used machine learning to predict survival in patients with colorectal cancer. They built a model by combining biological markers with clinical data. This approach could help improve prognosis and guide treatment strategies for one of the world’s most common and deadly cancers.

The team analyzed data from over 500 patients, using clinical details such as age, chemotherapy status, and cancer stage, along with molecular features like gene expression and microRNAs. Their goal was to improve how clinicians identify high-risk patients and make outcome predictions more precise. Researchers evaluated three different patient data scenarios using different machine learning techniques. The best-performing was an adaptive boosting model, which achieved 89.58% accuracy. This approach showed that integrating clinical and biological data led to significantly better predictions than using either data type alone. 

Among the biological markers, the gene E2F8 was consistently influential in all patient groups and is known to play a role in tumor growth. Other important markers included WDR77 and hsa-miR-495-3p, which are also associated with cancer development. Key clinical predictors included cancer stage, patient age, lymph node involvement, and whether chemotherapy was administered.

“The proposed method combines biological and clinical features to predict patient survival, using as input data from patients from the United States, available in the TCGA database.”

Unlike earlier models that relied on either clinical or molecular data alone, this study demonstrates the added value of combining both. Ensemble methods, which merge multiple learning algorithms, provided more stable and consistent results across all patient groups tested.

These research findings could lead to new tools that help clinicians better predict how a patient’s disease might progress or respond to treatment. The study also highlights the importance of collecting complete clinical information, such as lifestyle factors, which were missing from the dataset but could enhance future predictions.

Overall, the study demonstrated how machine learning can support more accurate and personalized survival predictions in colorectal cancer. It also points to potential future research on markers like E2F8, which may be useful for monitoring or targeted therapy.

DOI: https://doi.org/10.18632/oncotarget.28783

Correspondence to: ​​Lucas M. Vieira – lvieira@health.ucsd.edu 

Keywords: cancer, colorectal cancer, machine learning, feature selection, non-coding RNAs, genes

Click here to sign up for free Altmetric alerts about this article.

________

About Oncotarget:

Oncotarget (a primarily oncology-focused, peer-reviewed, open access journal) aims to maximize research impact through insightful peer-review; eliminate borders between specialties by linking different fields of oncology, cancer research and biomedical sciences; and foster application of basic and clinical science.

Oncotarget is indexed and archived by PubMed/Medline, PubMed Central, Scopus, EMBASE, META (Chan Zuckerberg Initiative) (2018-2022), and Dimensions (Digital Science).

To learn more about Oncotarget, visit Oncotarget.com and connect with us on social media:

X
Facebook
YouTube
Instagram
LinkedIn
Pinterest
Spotify
, and available wherever you listen to podcasts

Click here to subscribe to Oncotarget publication updates.

For media inquiries, please contact media@impactjournals.com


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.