News Release

Revolutionizing disease classification and identifying hidden disease patterns

Peer-Reviewed Publication

The Hebrew University of Jerusalem

Researchers have developed a machine learning approach to identify potential subtypes in diseases, significantly enhancing disease classification and treatment strategies. The model, which achieved an 89.4% ROC AUC, uncovered 515 previously unannotated disease subtypes, demonstrating the potential for more precise and personalized medical treatments.


Researchers from the Hebrew University of Jerusalem have developed a machine learning approach to identify potential subtypes in diseases, significantly enhancing the field of disease classification and treatment strategies. The study, led by PhD student Dan Ofer and Prof. Michal Linial from the Department of Biological Chemistry at The Life Science Institute at Hebrew University, marks a significant advancement in the use of artificial intelligence in medical research.

Distinguishing diseases into distinct subtypes is pivotal for accurate study and effective treatment strategies. The Open Targets Platform integrates biomedical, genetic, and biochemical datasets to support disease ontologies, classifications, and potential gene targets. However, many disease annotations remain incomplete, often necessitating extensive expert medical input. This challenge is especially significant for rare and orphan diseases, where resources are limited.

The research introduces a novel machine learning approach to identify diseases with potential subtypes. Utilizing the extensive database of approximately 23,000 diseases documented in the Open Targets Platform, they derived new features to predict diseases with subtypes using direct evidence. Machine learning models were then applied to analyze feature importance and evaluate predictive performance, uncovering both known and novel disease subtypes.

The model achieved an impressive 89.4% ROC Area Under the Receiver Operating Characteristic Curve in identifying known disease subtypes. The integration of pre-trained deep-learning language models further enhanced the model's performance. Notably, the research identified 515 disease candidates predicted to possess previously unannotated subtypes, paving the way for new insights into disease classification.

"This project demonstrates the incredible potential of machine learning in expanding our understanding of complex diseases," said Dan Ofer. "By leveraging advanced models, we can uncover patterns and subtypes that were previously hidden, ultimately contributing to more precise and personalized treatments."

This innovative methodology enables a robust and scalable approach for improving knowledge-based annotations and provides a comprehensive assessment of disease ontology tiers. "We are excited about the potential of our machine learning approach to revolutionize disease classification," said Prof. Michal Linial. "Our findings can significantly contribute to personalized medicine, offering new avenues for therapeutic development."

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.