image:  Low-level visual semantic (left): structure constraints between neighboring pixels; middle-level visual semantic (middle): structural regions corresponding to façades, footprints, and roofs; high-level visual semantic (right): building height estimation where double-bounce regions (red) and different regions M (yellow) correspond to different lengths L (thine dashed lines with different colors) associated with building heights.
Credit: ©Science China Press
This study is led by professors Zhanyi Hu and Qiulei Dong (Institute of Automation, Chinese Academy of Sciences). TomoSAR is a promising technique to handle layover phenomenon and reconstruct high-resolution 3D structures of targets from a stack of coregistered SAR images. However, the SAR side-looking imaging geometry and numerous interference factors pose significant challenges, including:
(1) reconstruction of only some sparse 3D points, resulting in an incomplete representation of building structures.
(2) presence of large outliers in the recovered 3D points, which handicaps the subsequent buildings modeling.
(3) computing elevations pixel by pixel, which leads to low-efficiency.
(4) failure to parse the component semantics such as facades and roofs, which further limits the realistic buildings modeling.
To tackle such problems, “It is desirable and promising to fully exploit visual semantics embedded in SAR images because such semantics are the factual description of the current scene and more informative and useful than the general priors commonly adopted in the literature. The key is how to extract and exploit such scene semantics.” professor Hu says.
Then, professors Hu and Dong, together with other team members Wei Wang, Liankun Yu and Haixia Wang, began to explore what and how the SAR visual semantics could be extracted and used in TomoSAR. Based on a systematical analysis on the characteristics of SAR images, they proposed the following three-level (low, middle, and high) SAR visual semantics as the prime candidates:
(1) Low-level visual semantics typically exist at the pixel level and are characterized by differences in intensities and structure types between neighboring pixels and the intensity distributions of the pixels in different regions. Generally, the main applications of low-level visual semantics include (i) detecting the initial position or shape of targets (e.g., facades and roofs corresponding to double-bounce regions), (ii) improving the accuracy of pixel-level structure inference using the structural similarity between neighboring pixels, and (iii) jointly processing neighboring pixels to improve the efficiency of structure inference.
(2) Middle-level visual semantics typically exist in the form of specific geometric primitives (e.g., line segments and rectangles) to represent the local shape and size of the target. The main applications of middle-level visual semantics include (i) detecting target categories and spatial relationships, and (ii) inferring the underlying structures of targets.
(3) High-level visual semantics refers to the category and global geometric information (e.g., height and layout) of single or multiple targets. The main applications of high-level visual semantics include (i) understanding the global structure with different target positions and sizes, and (ii) higher-order constraints constructed to improve the global accuracy and efficiency of the pixel-level 3D reconstruction.
In addition, in order to validate the effectiveness and efficiency of the proposed SAR visual semantics in TomoSAR, the team introduced an efficient 3D reconstruction method to produce box-like models of buildings. In the proposed method, only a small percentage of pixels (5%) are sampled to compute the elevations initially, then realistic box-like 3D models are obtained in subsequent steps by taking the higher-level semantics into account.
This work is only a small step in the new emerging framework of the semantics- based SAR 3D imaging, more systematical investigation is needed, for example, how to exploit “implicit representations of scene semantics " commonly obtained by deep learning. Although the new framework is promising, some caution is needed, as Prof. Hu emphasized. “if too much efforts are put into the SAR semantics extraction and exploitation, their desired roles in SAR 3D imaging will be largely compromised, hence the traditional data collection and this newly advocated semantics exploitation should be balanced”. Currently we are working on more complex scenes modeling, such as cluttered buildings with various different roof structures.
See the article:
Exploiting SAR Visual Semantics in TomoSAR for 3D Modeling of Buildings
http://engine.scichina.com/doi/10.1360/nso/20230067
Journal
National Science Open
 
                