image: Using a low-resolution image captured by a mobile phone or down sampled from a large-scene dataset, the new method MILI (multi-person inference from a low-resolution image) can achieve more accurate multi-person reconstruction compared with a state-of-the-art (SOTA) method. view more
Credit: The Authors
Accurately estimating 3D poses and body shapes from a single image is critical for several applications, such as behavior analysis and security alerts. Unfortunately, many existing multi-person reconstruction methods need the people present to be clearly visible in the photo to supply enough information. This becomes a problem when cameras have limited resolutions and the field of view is increased to capture individuals in distant areas, resulting in low-resolution images that provide little information.
To address that limitation, a research team from Tianjin University and Cardiff University attempted to reconcile the conflict between image resolution and estimation accuracy. As reported in the KeAi journal Fundamental Research, the team proposed an end-to-end multi-task machine learning framework known as MILI (multi-person inference from a low-resolution image) that enables accurate multi-person 3D pose and shape representation from a low-resolution image.
Further, to tackle the occlusion issue in multi-person scenes, the researchers devised an occlusion-aware mask prediction network for estimating the mask of each person's mesh during regression. Pair-wise images with high- and low-resolution were also used for training.
"In both small-scale and large-scale scenes, MILI outperformed the state-of-the-art methods both quantitatively and qualitatively," said Kun Li, lead author of the study. “Different from the existing work, MILI, as an end-to-end network, encourages the multi-person reconstruction even from low-resolution images and significantly improves the robustness to occlusions with the occlusion-aware mask prediction network by refining the detection stage with segmentation.”
The code is available at http://cic.tju.edu.cn/faculty/likun/projects/MILI.
"Reconstruction of 3D poses and shapes for the individuals in a surveillance scene will allow for better recognition of actions/activities, including the interaction between people, modeling crowd behavior for simulations and security monitoring, and better tracking of individuals over time," concluded Li.
###
Contact the corresponding author: Kun Li, lik@tju.edu.cn, Jingyu Yang, yjy@tju.edu.cn
The publisher KeAi was established by Elsevier and China Science Publishing & Media Ltd to unfold quality research globally. In 2013, our focus shifted to open access publishing. We now proudly publish more than 100 world-class, open access, English language journals, spanning all scientific disciplines. Many of these are titles we publish in partnership with prestigious societies and academic institutions, such as the National Natural Science Foundation of China (NSFC).
Journal
Fundamental Research
Method of Research
Imaging analysis
Subject of Research
People
Article Title
MILI: Multi-person inference from a low-resolution image
COI Statement
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.