Article Highlight | 8-Jul-2025

Rethinking machine learning for frontier science

King Abdullah University of Science & Technology (KAUST)

A novel way to train AI models that empowers them to better assist in cutting-edge research has been developed by researchers at KAUST^[1]. The new machine learning method enables accurate AI prediction even in frontier areas of science where only very limited data is available to train the model.

“The new method is already generating new leads in the development of sustainable aviation fuel (SAF), potentially helping to overcome a major challenge in the clean energy transition,” says the lead author of the study, Basem Eraqi, a Ph.D. student in the Clean Energy Research Platform, led by Mani Sarathy.

AI models with property prediction capabilities could dramatically accelerate the discovery of molecules with advanced performance for a specific task. “To build such models, conventional machine learning techniques typically require large, well-balanced datasets to achieve reliable performance,” Eraqi says. However, in many cases — including the development of new pharmaceuticals and polymers, as well as sustainable aviation fuels — there is very little data available for each molecular property of interest.

“Our goal was to develop a machine learning method that performs well even in this ultra-low-data regime, enabling performant material discovery in data-scarce domains,” Eraqi says.

The team based their approach on a method called multi-task learning (MTL), which trains a model to predict multiple properties at once. “The core idea is that, by learning related tasks simultaneously, the model can extract and reuse shared patterns in the data,” Eraqi explains. A molecule’s flammability limits, for example, are related to its volatility, and so learning these properties together can enhance the model’s predictive performance.

The smaller or more imbalanced the dataset used for MTL, however, the greater the chance of ‘negative transfer’, where the model makes erroneous connections that harm the model’s predictive performance.

To protect against negative transfer, the team developed a novel training scheme called Adaptive Checkpointing with Specialization (ACS). “ACS monitors each task’s performance and preserves the best-performing model state for that task, allowing for safe and effective knowledge sharing,” Eraqi says. By mitigating negative transfer, ACS can improve the accuracy and stability of molecular property predictions.

The team trialed ACS by testing its capability to predict properties of potential SAF components. “SAF development is a high-impact, real-world challenge where experimental data is extremely limited and labour-intensive to obtain,” Eraqi concludes. ACS delivered robust and accurate predictions across 15 SAF properties, consistently outperforming conventional models. It performed especially well in ultra-low-data settings, with as few as 29 training data points, achieving over 20% higher predictive accuracy than conventional training methods.

“The model’s accurate predictions are already helping to accelerate the discovery and development of new SAF blends,” Sarathy says. “We are applying the ACS methodology to predict several dozen SAF-relevant properties that can impact aircraft emissions and efficiency,” he adds. “These property predictions are then being fed into a fuel design tool targeting novel SAF formulations for an industrial partner.”

The team has also tested ACS on pharmaceutical and molecular toxicity datasets, confirming that it delivered significant predictive accuracy improvements over conventional training methods.

Reference

Eraqi, B.A., Khizbullin, D., Nagaraja, S. S., Sarathy, S. M. Molecular property prediction in the ultra‐low data regime. Communications Chemistry 8, 201 (2025). | article.

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.