News Release 14-Aug-2025

Researchers create multimodal sentiment analysis method that improves detection of human emotions while reducing computational cost

A novel approach in multimodal Sentiment Analysis called R3DG offers improved digital detection of human emotions with reduced computational cost

Peer-Reviewed Publication

Research

Novel multimodal sentiment analysis approach — **image:**
**Researchers have developed a novel MSA method that improves sentiment detection while reducing computational cost**
view more

Credit: Professor Fuji Ren from University of Electronic Science and Technology of China

Multimodal sentiment analysis (MSA) is an emerging technology that seeks to digitally automate extraction and prediction of human sentiments from text, audio, and video. With advances in deep learning and human-computer interaction, research in MSA is receiving significant attention. However, when training MSA models or making predictions, aligning different modalities such a text, audio and video for analysis can pose significant challenges.

There are several ways of aligning various modalities in MSA. Most MSA methods align modalities either at the ‘coarse-grained level’ by grouping representations over different time steps or at the ‘fine-grained level’ by grouping modalities at each time step (step-by-step alignment). However, these approaches can fail to capture the individual variations in emotional expression or differences in contexts in which sentiments are expressed. To overcome this crucial limitation, researchers have now developed a framework for analyzing inputs of different modalities. Their study, which was recently published in Research, shows that the framework ‘Retrieve, Rank, and Reconstruction with Different Granularities (R3DG)’ outperforms existing analysis methods and reduces the computational time required for analysis.

“Coarse-grained methods may miss subtle emotional cues like 'head nod’, ‘frown’, or ‘high pitch’, especially in long videos. On the other hand, fine-grained alignment can lead to fragmented representations, where emotional events are divided into multiple time steps, creating data redundancy. Furthermore, these methods are computationally expensive due to the need for extensive attention-based alignment”, explains Professor Fuji Ren of the University of Electronic Science and Technology of China, the lead researcher of the study.

Existing MSA approaches either average features over all time steps or align various features at each step, achieving one granularity of alignment at the maximum. In contrast, R3DG analyses representations at varying granularities, thus preserving potentially critical information and capturing emotional nuances across modalities. By aligning audio and video modalities to the text modality using representations at varying granularities, R3DG reduces computational complexity while enhancing the model’s ability to capture nuanced emotional fluctuations. Its segmentation and selection of the most relevant audio and video features—combined with reconstruction to preserve critical information—contribute to more accurate and efficient sentiment prediction.

The researchers critically assessed the comparative performance of R3DG using five benchmark MSA datasets. R3DG demonstrated superior performance compared to existing methods across these datasets, with a substantial reduction in expended computational time. The findings suggest that the R3DG approach may be among the most efficient MSA methods.

“Experimental results demonstrate that R3DG achieves state-of-the-art performance in multiple multimodal tasks, including sentiment analysis, emotion recognition, and humor detection, outperforming existing methods. Ablation studies further confirm R3DG’s superiority, highlighting its robust performance despite the reduced computational cost”, Dr. Jiawen Deng, the co-corresponding author, highlights the main findings of his study.

R3DG achieves modality alignment in just two steps—first between video and audio modalities, and then between their fused representation and text. This streamlined approach significantly reduces computational cost compared to most existing models. With its enhanced efficiency, R3DG demonstrates strong potential to drive the next generation of MSA.

“Looking ahead, future work will focus on automating the selection of modality importance and granularity, further enhancing R3DG’s adaptability to diverse real-world applications”, states Professor Ren, anticipating exciting future improvements to the MSA approach.

About the University

Founded in 1956, The University of Electronic Science and Technology of China (UESTC) is a public university in Chengdu affiliated with the Ministry of Education of China. It receives funding from the Ministry of Education, the Ministry of Industry and Information Technology, the Sichuan Provincial Government, and the Chengdu Municipal Government. Although UESTC focuses on electronic science and technology, it has leading researchers in varied disciplines.

About the Journal

Launched in 2018, Research is the first journal in the Science Partner Journal (SPJ) program. Research is published by the American Association for the Advancement of Science (AAAS) in association with Science and Technology Review Publishing House. Research publishes fundamental research in the life and physical sciences as well as important findings or issues in engineering and applied science. The journal publishes original research articles, reviews, perspectives, and editorials. IF=10.7，Citescore=13.3.

Sources: https://doi.org/10.34133/research.0729

Journal

Research

DOI

10.34133/research.0729

Method of Research

News article

Subject of Research

Not applicable

Article Title

R3DG: Retrieve, Rank, and Reconstruction with Different Granularities for Multimodal Sentiment Analysis

Article Publication Date

2-Jul-2025

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.