News Release

Proposing new evolutionary connections from 214 million predicted protein structure database

Large-scale protein data research with foldseek cluster

Peer-Reviewed Publication

Seoul National University

Proteins are atomic scale machines that operate the cell, playing important roles in energy generation, DNA replication, defence against diseases and many other processes. These machines can come in all sorts of shapes and sizes. Teams at Seoul National University, ETH Zurich and EMBL-EBI present in the journal Nature a new approach capable of comparing the shapes and structures of hundreds of millions of proteins, allowing them to catalog many different types of shapes there are in nature. Using this novel method, they have discovered unexpected similarities between human proteins involved in our immune system and proteins found in bacteria.

Proteins are produced in cells as chains of amino-acids, like beads on a string, and then fold into 3D structures. There are about 20,000 protein types in humans, but across all species, there are over 200 million proteins. While scientists can easily decode the sequence of amino-acids, determining proteins& 3D structures is far more challenging. However, a recent breakthrough in AI, AlphaFold2, allows for accurately predicting protein structures. Using this prediction technology has enabled research groups at DeepMind and the EMBL-EBI to create a treasure trove of over 200 million predicted protein structures. While many discoveries were made by studying these structures individually, including enzymes that degrade plastics or fight antibiotic resistance, examining them collectively has much greater potential. However, this required inventing a method that could efficiently calculate the similarities among hundreds of millions of protein structures.

Starting from a total of over 200 million structures, the team of scientists discovered that these can be grouped into just over two million different shapes. "We estimated that performing these calculations using established methods would have taken ten years, when compared to the five days it took using our new method" - states Prof. Martin Steinegger from Seoul National University.

In evolution, the sequence of proteins changes more rapidly than their structure. For this reason, the comparison of shapes can identify the same protein in distantly- related species, even when this is not apparent from a comparison of their sequences. Indirectly, this tells us something about the ancient evolution of these proteins: "It is as if we found a sort of time machine that lets us look back in evolutionary history to study when in the very distant past these proteins originated" - Prof. Pedro Beltrao from ETH Zurich.

When analyzing the 2 million shape-groups, the team discovered that many human immunity-related proteins had 3D shapes that were similar to proteins found in bacteria. These similarities suggest that our immune system may be based on proteins that are much older than previously thought and that the mechanisms to fight against pathogens are more broadly shared between species.

This work opens up new avenues, not just for immunity-related proteins, but also for the understanding of the evolution of the whole protein universe, thus promising more exciting discoveries. For an in-depth description of these findings, refer to the publication in [Nature, 2023, 10.1038/s41586-023-06510-w].

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.