image: Two-Stage Hierarchical Pre-training Framework of S2ALM: In Stage I, the model learns foundational sequence–structure relationships from large-scale protein data, using 1D amino acid sequences and 3D structural sequences. This stage employs masked language modeling to embed general biochemical patterns. Stage II shifts focus to antibody-specific data, incorporating Sequence–Structure Matching (SSM) and Cross-Level Reconstruction (CLR) objectives to capture the intricate interplay between antibody sequences and structures.
Credit: Copyright © 2025 Mingze Yin et al.
Antibodies, also known as immunoglobins, are specialized proteins produced by the body’s immune system to fight against harmful invaders, such as viruses and pathogens. Since these proteins need to bind to their targets, each protein has a unique structure, which is specific to its target. Due to their specificity and low adverse effects, these antibodies are widely explored for their therapeutic effects in the form of drugs.
While traditionally studied using the tedious wet-lab methods, molecular scientists are now turning to computational models to design antibodies which ensures greater precision in less time. In a leap towards advancing the use of AI in antibody design, a team of researchers from China have developed a groundbreaking AI model called S2ALM (Sequence-Structure multi-level pre-trained Antibody Language Model) which can analyze, predict, and design antibodies using structure specific details.
The study was led by Professor Tingjun Hou and Professor Chang-Yu Hsieh from the College of Pharmaceutical Sciences, Zhejiang University, China, in collaboration with Assistant Professor Jintai Chen from AI Thrust, Information Hub, HKUST (Guangzhou), and Professor Jian Wu from the Zhejiang Key Laboratory of Medical Imaging Artificial Intelligence, China. The findings of the study were made available online in Research on May 12, 2025.
“The molecular basis of any antibody protein lies in its amino acid sequence,” explains Prof. Hou, “The sequence decides its 3D structure and the structure decides its biological function”
While most existing AI models only focus on the amino acid sequence, S2ALM is the first of its kind to integrate both, sequence and structure, offering a more complete understanding of how antibodies function. To build this model, trained the model on a large dataset incorporating 75 million antibody and protein sequences and 11.7 million 3D structures, which included both, experimentally determined and computer-predicted structures.
Furthermore, they introduced two innovative learning strategies in a hierarchical pre-training paradigm (stepwise AI training approach). One strategy was Sequence-Structure Matching (SSM), which helps the model link sequence data with corresponding structures. The second strategy was Cross-Level Reconstruction (CLR), which enables the model to predict missing information by leveraging both sequence and structural clues.
The results of the strategic combination were quite impressive. The S2ALM model outperformed all other leading models in several key tasks involved in antibody research and drug development. These included; antigen binding capacity prediction, tracking B cell maturation (for antibody development), identification of antibody paratopes (specific antigen-binding regions), prediction of antigen-target binding strength (affinity) and, design of new antibody sequence.
One of the most striking outcomes of the developed model was its ability to generate entirely new antibody candidates which could target pathogens like SARS-CoV-2, Ebola virus, and Influenza B virus. Advanced structural predictions revealed that these AI-designed antibodies could easily form stable and functional 3D shapes which are suitable for targeting diseases.
“The success of S2ALM is three-fold; firstly, it learns from a comprehensive data of antibody representations, secondly, it’s unique learning approach incorporates detailed structural information with biological features and thirdly, it exceeds state-of-the-art performance on extensive tasks, even in designing new antibodies” remarks Prof. Wu.
While the development of S2ALM marks a milestone in antibody research, its applications also offer real-world potential for therapeutic innovations. By reducing the reliance on trial-and-error in laboratory methods, this model can accelerate the development of next-generation antibodies—bringing us one step closer to faster, reliable and cost-effective immune-based therapies.
About the Zhejiang University, China
Zhejiang University (ZJU) established in 1897 in Hangzhou, is one of the most prestigious research universities in China. It is a member of the C9 League (an alliance of nine top universities in China) and also a core participant of many national initiatives like Double First-class and project 895.
ZJU comprises 37 colleges and schools across China and is known for its academic excellence, interdisciplinary innovation and global engagement. With its high-end research collaborations and dedication, ZJU strives to foster future leaders with a global vision and social responsibility, advances interdisciplinary, impactful research under the motto "Seeking Truth, Pursuing Innovation."
About the Journal Research (SPJ)
Launched in 2018, Research is the first journal in the Science Partner Journal (SPJ) program. Research is published by the American Association for the Advancement of Science (AAAS) in association with Science and Technology Review Publishing House. Research publishes fundamental research in the life and physical sciences as well as important findings or issues in engineering and applied science. The journal publishes original research articles, reviews, perspectives, and editorials. It has an Impact factor (IF) of 10.7 and Citescore of 13.3.
Sources: https://doi.org/10.34133/research.0721
Journal
Research
Method of Research
News article
Subject of Research
Not applicable
Article Title
S2ALM: Sequence-Structure Pre-trained Large Language Model for Comprehensive Antibody Representation Learning
Article Publication Date
19-Aug-2025