Machine learning is transforming many scientific fields, including computational materials science. For about two decades, scientists have been using it to make accurate yet inexpensive calculations of interatomic potentials, that are mathematical functions that express the energy of a system of atoms and are an ingredient to simulate and predict the stability and properties of materials. But machine learning by itself is not a magic wand, and many problems remained. Now a study by Michele Ceriotti’s group at EPFL has introduced a new dataset and model that greatly improve the efficiency of machine-learning interatomic potentials (or MLIPs, as experts call them) and their applicability for different chemical elements and material classes.
The first generation of MLIPs needed to be tuned for each specific physical system, which requires complex calculations based on the quantum properties of the system and deep chemical knowledge. So, a second generation of general-purpose or universal models was developed that could compute interatomic potentials across a wide range of chemical systems with minimal fine-tuning. But these models were not as “universal” as scientists would have wished, mostly because of limitations in the training data.
“When this transition to universal models happened, we found ourselves in a situation where we had the models, but not much data to train them” says Arslan Mazitov, a scientist in Michele Ceriotti’s Laboratory of Computational Science and Modeling (COSMO) at EPFL
“The only available data at that point were coming from materials databases like the Materials Project Database, that mostly include stable materials with ideal lattices and bulk structures with few modifications. They are not suitable for training universal models. If you want a model that is transferable and can be used to study relevant processes like surface interactions, surface adsorption, phase changes, you need something different”.
In their new study, published in Nature Communications, the authors introduce PET-MAD, a general-purpose MLIP based on a dataset of their own creation and an original neural network architecture. The first key innovation of the study is a brand-new dataset called Massive Atomistic Diversity (MAD), that includes 95,595 structures of 85 elements in total, covering both organic and inorganic materials and ranging from 3D bulk structures to nanoclusters and molecules.
Compared to previous resources, the dataset's construction involved recalculating energies and forces using consistent density functional theory. “We worked on the data to make it more compact and denser in terms of information than the previous databases, so that it would prove more efficient when it came to training a neural network” says Mazitov.
The MAD database is freely available to all resources through the Materials Cloud archive.
The other key element is the network architecture itself. “In the past, people were keen on building chemical knowledge into their models, for example by imposing restrictions with respect to the physical symmetries: in short, making sure that the energy of a molecule does not depend on its orientation in space ” says Mazitov.
“Instead, we decided to create a model that makes fewer a priori assumptions, and learns symmetries during training”. Though the prediction of the models are not perfectly invariant to orientation, they are to such a high degree that they can safely be used to perform robust simulations, and with a higher speed than existing models.
The authors validated PET-MAD by performing advanced property predictions on six case studies covering several different materials and comparing with models trained specifically for each problem. They found that in most cases the new universal model can be applied out-of-the-box, to the same level of accuracy of models that have to be painstakingly optimized for one system at a time.
Among the most significant case studies is the one on ionic transport in lithium thiophosphate. “It’s a hot topic that many groups are studying as part of their work on solid state electrolytes, and we show we can give the user, out of the box, the ability to screen arbitrary numbers of electrolyte materials and estimate their ionic conductivity” says Mazitov. Another highlight is the calculation of the melting point of gallium arsenide. “Phase transitions and phase diagrams are things materials scientists need to study very often."
Equally intriguing is the use of PET-MAD to study how quantum nuclear fluctuations affect the nuclear magnetic resonance (NMR) chemical shielding in organic crystals, and hence how they affect NMR crystallography. “This touches the molecular world, that is not accessible by the current generation of universal MLIPs, and we tried to create a bridge to this domain”.
The model still has great room for improvement, says Mazitov. One of its current limitations is the underlying theory. “We use relatively simple electronic-structure approximations, but we would be happy to use more complex and more accurate DFT functionals” says Mazitov. A second aspect is that the model is very much focused on short range interactions, and researchers hope to include more of the longe-range part of the interatomic potential. Finally, the model would be even more applicable if it had an even more diverse datasets covering more classes of materials.
At the end of the day, one key contribution of PET-MAD is to make complex simulations available to all scientists. Previous universal models were trained on hundreds of millions of structures, with a computational cost that is prohibitive for many academic laboratories. Ceriotti and his team wanted to make a point that this is no longer true. “We show that by choosing and preparing the dataset wisely, PET-MAD allows to increase the efficiency and decrease the cost of training, often to a level where it accessible for small budget labs, and without sacrificing accuracy, transferability or inference speed”, says Mazitov.
Journal
Nature Communications
Article Title
PET-MAD as a lightweight universal interatomic potential for advanced materials modeling
Article Publication Date
27-Nov-2025