image: Infographic summarizing five key domains where LLMs revolutionize bioinformatics: 1) DNA/RNA sequence analysis, 2) Protein structure prediction, 3) Multi-omics data integration, 4) Drug discovery, and 5) Biomedical literature mining.
Credit: Lin et al./Briefings in Bioinformatics 2025 (Based on Biorender.com)
Artificial intelligence-powered large language models (LLMs), like those behind ChatGPT, are rapidly reshaping bioinformatics research. A new study systematically details how these models decode complex biological data—from predicting protein structures to identifying disease-linked genes—with unprecedented speed and accuracy.
Published in Briefings in Bioinformatics, the review outlines five core strengths of LLMs:
- Processing long biological sequences (e.g., DNA, proteins) using advanced tokenization and attention mechanisms.
- Capturing semantic patterns in data for tasks like gene annotation and drug-target interaction prediction.
- Cross-modal learning to integrate text, genomics, and structural biology data.
- Reducing manual effort via end-to-end learning.
- Leveraging unlabeled data through self-supervised training.
LLMs have enabled breakthroughs such as:
- Protein folding prediction (e.g., ESMFold) for drug design.
- Genome interpretation (e.g., DNABERT) for identifying disease mutations.
- Drug repurposing using tools like PharmBERT to analyze clinical literature.
However, challenges persist in model transparency, computational costs, and data biases. The authors call for:
- Multimodal AI systems combining genomic, imaging, and clinical data.
- Explainable AI frameworks to build scientific trust.
- Ethical guidelines for privacy in biomedical AI.
"LLMs are not just tools—they represent a paradigm shift in how we study life sciences," said senior author Dr. Peng Luo. "Their integration with experimental biology will accelerate discoveries from lab to clinic."
Journal
Briefings in Bioinformatics
Article Title
Bridging artificial intelligence and biological sciences: a comprehensive review of large language models in bioinformatics
COI Statement
None declared.