image: Architecture of Multimodal Perception -based Weakly Supervised model for Bird Clustering Counting.
Credit: Shu-xiang FENG, Mneg-xue LYU, Xue-tao HAN, Chang Liu and Jun Qiu
Wetland avifauna serves as crucial bioindicators for ecosystem health assessment while its population monitoring of wetland birds represents a critical component in wetland management and conservation. However, traditional counting methods, such as point counting and line transects, are time-consuming, costly, and prone to human error. Optical image-based bird counting makes large-scale bird counting tasks possible, but target detection and accurate counting remain challenging in complex environmental conditions.
To address these challenges, a team of researchers in China presents an annotation-free avian population estimation approach that integrates optical characteristics with visual semantics, utilizing quantitative annotations to achieve weakly supervised counting while significantly reducing labeling costs.
The study is published in the KeAi journal Watershed Ecology and the Environment.
“Building upon enhanced optical image features, we constructed a multimodal perception model incorporating learnable feature adapters,” shares corresponding author Chang Liu, professor at the Institute of Applied Mathematics, Beijing University of Information Science and Technology. “The model employs visual prompts to focus on counting-relevant features and utilizes residual connections to address challenges posed by pose variations and complex backgrounds.”
The count regression problem was transformed into a classification task by embedding ordered numerical sequences (e.g., “0 birds”, “5 birds”, … “100+ birds”) as semantic category labels. The text template "There are [class] birds in the picture" lead the model aligns numerical semantics from text with image features, enabling accurate counting without the need for explicit object localization. In addition, to handle multi-scale variations in bird flocks, the researchers designed a cross-scale information interaction module that propagates visual prompts across different feature scales, generating semantically rich fused representations.
"We compiled and released the Wetland-Bird-Count, a novel optical image dataset specifically designed for coastal wetland avian population assessment of the Yellow River Delta, filling a critical gap in ecological monitoring resources," adds Liu. “Experimental results on the Wetland-Bird-Count dataset, which contains optical images from coastal wetlands in the Yellow River Delta, show that the proposed method achieves a MAE of 45.2 and an MSE of 54.2, outperforming existing weakly supervised and unsupervised methods and achieving comparable results to fully supervised methods.”
The study verifies that the weakly supervised cluster counting using optical image visual cues can improve the accuracy of bird flock counting under lightweight annotation, providing a reliable quantitative analysis tool for optical image ecological monitoring.
###
Contact the author:Chang Liu, Institute of Applied Mathematics, Beijing Information Science and Technology University, Beijing, China, liu.chang.cn@ieee.org
The publisher KeAi was established by Elsevier and China Science Publishing & Media Ltd to unfold quality research globally. In 2013, our focus shifted to open access publishing. We now proudly publish more than 200 world-class, open access, English language journals, spanning all scientific disciplines. Many of these are titles we publish in partnership with prestigious societies and academic institutions, such as the National Natural Science Foundation of China (NSFC).
Journal
Watershed Ecology and the Environment
Method of Research
Computational simulation/modeling
Subject of Research
Animals
Article Title
Weakly supervised bird-flock counting in wetlands based on multimodal optical image perception
COI Statement
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.