News Release

Omni-modal language models: Paving the way toward artificial general intelligence

Peer-Reviewed Publication

ELSP

Omni-modal language models integrate modality alignment, semantic fusion, and joint representation to enable unified perception and reasoning across text, image, and audio modalities.

image: 

Omni-modal language models integrate modality alignment, semantic fusion, and joint representation to enable unified perception and reasoning across text, image, and audio modalities.

view more 

Credit: Zheyun Qin & Lu Chen / Shandong University & Shandong Jianzhu University

The survey “A Survey on Omni-Modal Language Models” offers a systematic overview of the technological evolution, structural design, and performance evaluation of omni-modal language models (OMLMs). The work highlights how OMLMs enable unified perception, reasoning, and generation across modalities, contributing to the ongoing progress toward Artificial General Intelligence (AGI).

Recently, Lu Chen, a master’s student at the School of Computer and Artificial Intelligence, Shandong Jianzhu University, in collaboration with Dr. Zheyun Qin, a postdoctoral researcher at the School of Computer Science and Technology, Shandong University, published a comprehensive review entitled “A Survey on Omni-Modal Language Models” in AI+ Journal.

The paper provides an in-depth analysis of the core technological evolution, representative architectures, and multi-level evaluation frameworks of omni-modal language models (OMLMs)—a new generation of AI systems that integrate and reason across multiple modalities, including text, image, audio, and video.

Unlike traditional multimodal systems dominated by a single input form, OMLMs achieve modality alignment, semantic fusion, and joint representation learning, enabling dynamic collaboration among modalities within a unified semantic space. This paradigm allows end-to-end task processing—from perception to reasoning and generation—bringing AI one step closer to human-like cognition.

The study also introduces lightweight adaptation strategies, such as modality pruning and adaptive scheduling, to improve deployment efficiency in real-time medical and industrial scenarios. Furthermore, it explores domain-specific applications of OMLMs in healthcare, education, and industrial quality inspection, demonstrating their versatility and scalability.

“Omni-modal models represent a paradigm shift in artificial intelligence,” said Lu Chen, the first author of the paper.

“By integrating perception, understanding, and reasoning within a unified framework, they bring AI closer to the characteristics of human cognition.”

Corresponding author Dr. Zheyun Qin added:

“Our survey not only summarizes the current progress of omni-modal research but also provides forward-looking insights into structural flexibility and efficient deployment.”

This work offers a comprehensive reference for researchers and practitioners in the field of multimodal intelligence and contributes to the convergence of large language models and multimodal perception technologies.
This paper was published in AI Plus (Chen L., Mu J., Wang J., Kang X., Xi X., Qin Z., A Survey on Omni-Modal Language Models, AI Plus, 2026, 1:0001. DOI: 10.55092/aiplus20260001).


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.