Article Highlight | 7-May-2026

A survey of downstream applications of evolutionary scale modeling protein language models

Higher Education Press

The advent of the evolutionary scale modeling (ESM) series of protein language models (PLMs) is a significant innovation in the convergence of large language models (LLMs) with protein representation. These models, trained on large amounts of unlabeled protein sequence data, learn the intricate patterns of mutation and conservation that have sculpted protein families through evolutionary history. ESM has become a widely used foundation model family for protein representation and downstream biological tasks.

Recently, Jie Zheng's research group at ShanghaiTech University, China published a review article titled "A survey of downstream applications of evolutionary scale modeling protein language models" in Quantitative Biology. This article comprehensively summarizes the latest developments of the ESM models, systematically categorizes the techniques for using ESM and its downstream applications in the biological field, and explores the current limitations and future research directions of ESM. This review provides a valuable resource for exploring the capabilities of ESM models and the applications of large language models in the biomedical field.

In a variety of downstream tasks, effectively utilizing the capabilities of ESM is the key to research. As shown in Figure 1, the article summarizes the following main categories of techniques:

Direct use: Using ESM-IF1 for fixed-backbone protein design, or ESMFold to predict three-dimensional protein structures.
Integration with task-specific models: ESM is treated as a high-quality feature extractor, and its output embeddings are fed into task-specific deep learning networks for supervised training.
Fine-tuning: To achieve better performance in specific downstream tasks, researchers often employ Parameter-Efficient Fine-Tuning (PEFT) methods on ESM models.
Multimodality: Since a single ESM model typically contains information from only one modality, researchers utilize techniques such as contrastive learning or cross-attention mechanisms to fuse ESM's sequence information with multi-modal information like protein structures and drug molecules.
The use of attention maps: The attention maps in Transformers are used directly to predict residue contact maps, or as additional features for predicting the structure of protein complexes.
Used for evaluation and validation: The scores or probability distributions generated by ESM models are utilized to assist and guide the directed evolution process, or act as an energy function to evaluate generated protein structures during simulated annealing optimization.

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.