News Release 1-Aug-2025

Making molecules make sense: A regional explanation method reveals structure–property relationships

Peer-Reviewed Publication

Intelligent Computing

Examples of sequence, image and graph representations. — **image:**
**Two examples of the same molecule represented in sequence, image and graph formats are provided for direct comparison.**
view more

Credit: Xin Wang et al.

In cheminformatics, where machine learning is transforming our understanding of how molecular properties are predicted and explained, a critical challenge has long remained: making these powerful but often "black box" models easier to interpret. Recently, researchers at the Australian National University developed a breakthrough solution: a "regional explanation" method that helps reveal how molecular structures drive their properties. This research was published June 3 in Intelligent Computing, a Science Partner Journal, in an article titled “Regional Explanations and Diverse Molecular Representations in Cheminformatics: A Comparative Study.”

The new regional explanation method bridges the gap between local and global explanations, capturing nonlinear relationships between molecular features and properties. The authors found that different molecular representations showed consistency in their regional explanations. The new method offers fine-grained, chemically meaningful insights often missed by traditional explanation methods. It was validated on 2 datasets, demonstrating broad applicability across different chemical domains.

To develop and validate this method, the researchers chose a dataset of 2,384 graphene oxide nanoflakes, each annotated with 783 molecular features used to predict formation energy, a key indicator of thermodynamic stability. After removing duplicates, 2,116 molecules remained. The researchers tested their method on 4 different data representations of these molecules, pairing each representation with an appropriate machine learning model: a multi-layer perceptron for the tabular representation, a transformer for sequences, a convolutional neural network for images and a graph convolutional network for graph data. To ensure robust and fair comparisons, missing values were addressed, and data normalization was applied. Both local explanation methods and the regional explanation approach were used to interpret model predictions. Analysis revealed that the predictive features identified by the new approach reflected real-world knowledge about chemical properties related to formation energy. The method's generalizability was confirmed using the Quantum Machine 9 (QM9) dataset, a larger and more chemically diverse benchmark set that supports results on the real-world graphene oxide nanoflake dataset.

The researchers believe their regional explanation method could have broad application, from materials design to drug discovery, and could serve as a practical tool to understand complex structure–property relationships. Future work may focus on incorporating automated clustering to better capture property-specific molecular patterns or on adding uncertainty quantification to enhance interpretability.

Journal

Intelligent Computing

DOI

10.34133/icomputing.0126

Article Title

Regional Explanations and Diverse Molecular Representations in Cheminformatics: A Comparative Study

Article Publication Date

3-Jun-2025

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.