image: Artistic representation of intrinsically disordered proteins.
Credit: Ramanna Shrinivas
Key Takeaways
- Researchers at Harvard and Northwestern have developed a machine learning method that can design intrinsically disordered proteins with custom properties, addressing nearly 30% of all human proteins that are currently out of reach of AI tools like AlphaFold.
- The new approach uses automatic differentiation, traditionally a deep learning tool, to optimize protein sequences for desired properties.
- The method opens new possibilities for engineering proteins, directly from physics-based models, that do not fold into a specific shape.
In synthetic and structural biology, advances in artificial intelligence have led to an explosion of designing new proteins with specific functions, from antibodies to blood clotting agents, by using computers to accurately predict the 3D structure of any given amino acid sequence.
But the structure of close to 30% of all proteins expressed by the human genome are challenging to predict for even the most powerful AI tools, including the Nobel-winning AlphaFold. Never settling into a fixed shape but constantly shifting around, these so-called intrinsically disordered proteins are key to countless biological functions like cross-linking molecules, sensing, or signaling, but their inherent instability makes them difficult to design from scratch.
A team at the Harvard John A. Paulson School of Engineering and Applied Sciences (SEAS) and Northwestern University have demonstrated a new machine learning method that can design intrinsically disordered proteins with tailored properties. The work opens doors to new understanding of these mysterious biomolecules and possible new insights into origins of and treatments for diseases.
The work is published in Nature Computational Science and was co-led by SEAS graduate student Ryan Krueger and former NSF-Simons QuantBio Fellow Krishna Shrinivas, now an assistant professor at Northwestern, in collaboration with Michael Brenner, the Catalyst Professor of Applied Mathematics and Applied Physics at SEAS.
-
Shrinivas said he became interested in studying intrinsically disordered proteins because they are out of reach of current AI-based methods, such as Google DeepMind’s AlphaFold, for predicting and designing proteins with distinct conformations. Yet, such disordered proteins are important to many fundamental aspects of biology, and it is known that mutations to these proteins are linked to diseases like cancer and neurodegeneration. One example of a disordered protein is alpha-synuclein, long implicated in Parkinson’s and other diseases. To design IDPs for synthetic or therapeutics uses, Shrinivas said, “we needed to either come up with better AI models, or, we needed to come up with a way to actually take those physics models where you not only get good predictions, but you also get the physics for free.”
Automatic differentiation algorithms
The paper describes a computational method powered by algorithms that can perform “automatic differentiation,” or automatic computation of derivatives – instantaneous rates of change – in order to rationally select for protein sequences with desired behaviors or properties. The technique is a widely used tool for deep learning and training neural networks, but Brenner and his lab were among the first to recognize other potential use cases, such as optimizing physics-based molecular dynamics simulations.
With automatic differentiation, the researchers were able to train a computer to recognize how small changes in protein sequences – even single amino acid changes – affect the final desired properties of proteins. They likened their method to a very powerful search engine for amino acid sequences that fit the criteria needed to perform a function – say, one that creates loops or connectors, or can sense different things in the environment.
“We didn’t want to have to take a bunch of data and train a machine learning model to design proteins,” Krueger said. “We wanted to leverage existing, sufficiently accurate simulations to be able to design proteins at the level of those simulations.”
The method leverages a traditional framework for training neural networks called gradient-based optimization to identify new protein sequences with efficiency and precision. The result is that the proteins the researchers designed are “differentiable,” that is, they are not best-guesses predicted by AI, but rather based in molecular dynamics simulations, using real physics, that take into account how proteins actually, dynamically behave in nature.
The research received federal support from the National Science Foundation AI Institute of Dynamic Systems, the Office of Naval Research, the Harvard Materials Research Science and Engineering Center, and the NSF-Simons Center for Mathematical and Statistical Analysis of Biology at Harvard.
Journal
Nature Computational Science
Method of Research
Computational simulation/modeling
Subject of Research
Not applicable
Article Title
Generalized design of sequence–ensemble–function relationships for intrinsically disordered proteins
Article Publication Date
6-Oct-2025