New discovery enables gene therapy for muscular dystrophies, other disorders
Peer-Reviewed Publication
Updates every hour. Last Updated: 12-May-2025 07:09 ET (12-May-2025 11:09 GMT/UTC)
A new study presents “Evo” – a machine learning model capable of decoding and designing DNA, RNA, and protein sequences, from molecular to genome scale, with unparalleled accuracy. Evo’s ability to predict, generate, and engineer entire genomic sequences could change the way synthetic biology is done. “The ability to predict the effects of mutations across all layers of regulation in the cell and to design DNA sequences to manipulate cell function would have tremendous diagnostic and therapeutic implications for disease,” writes Christina Theodoris in a related Perspective. With a vocabulary of just four nucleotides, DNA encodes all the genetic information essential for life. Variations in the genomic sequence reflect adaptations selected for specific biological functions. These variations drive evolution by enabling organisms to adapt to new or changing environments. Advances in DNA sequencing technologies have allowed for genomic variations to be mapped at the whole-genome scale. These data, combined with novel machine learning algorithms, could enable the creation of a comprehensive model that can understand DNA, RNA, and protein functions and their interactions. But, while some researchers inspired by the success of large language models (LLMs) have attempted to model DNA as a "language" by applying similar techniques, current generative models tend to focus narrowly on individual molecules or DNA segments. Alongside computational limitations, this has constrained the scope of these models in capturing broader genomic interactions necessary for understanding complex biological processes.
Here, Eric Nguyen and colleagues present Evo – a large-scale genomic foundation model, equipped with 7 billion parameters and designed to generate DNA sequences up to whole-genome scale. Built on the StripedHyena architecture, Evo was trained on a dataset of 2.7 million evolutionary diverse microbial genomes. According to Nguyen et al., Evo excels in both predictive and generative biological tasks, achieving high accuracy in zero-shot evaluations for predicting mutation impacts on bacterial proteins and RNA, as well as in modeling gene regulation. Evo also grasps the intricate coevolution between coding and noncoding sequences, supporting the design of complex biological systems like CRISPR-Cas complexes and transposable elements. At the genomic scale, Evo can generate sequences over 1 megabase in length, a capability vastly surpassing prior models. “Future models may learn from diverse human and other eukaryotic genomes, using larger context lengths to capture distant genomic interactions over larger genomic scales,” writes Theodoris in the Perspective.
Capturing carbon dioxide from the hot industrial exhaust of cement and steel plants requires cooling the exhaust from around 200 C to 60 C so that liquid amines can react with the CO2. UC Berkeley chemists have created a new type of metal-organic framework that captures CO2 at high temperatures, avoiding the need to expend energy and water to cool the exhaust. The MOF opens up a new field of high-temperature gas capture.
Researchers at New York University have devised a mathematical approach to predict the structures of crystals—a critical step in developing many medicines and electronic devices—in a matter of hours using only a laptop, a process that previously took a supercomputer weeks or months. Their novel framework is published in the journal Nature Communications.