image: Comparison of depth perception results across different models for autonomous driving. Each column shows the same driving scene under challenging conditions. Competing methods (SCSNet and SE-CFF) often blur object boundaries or miss fine details, while URNet produces clearer and smoother distance maps. The yellow boxes highlight regions where URNet better preserves shapes—like pedestrians and roadside barriers—demonstrating its stronger ability to perceive precise depth even in low-texture or complex areas.
Credit: Visual Intelligence, Tsinghua University Press
Imagine driving through a dark tunnel or along a rainy highway—your eyes can still sense motion and distance, but even advanced vehicle cameras often struggle under such conditions. Traditional cameras capture images frame by frame, which can cause motion blur or the loss of important details when the vehicle moves quickly or lighting is poor. To address this, scientists are turning to event cameras, a new type of sensor inspired by the human eye. Instead of recording conventional color images (known as RGB images), an event camera detects only tiny changes in brightness at each pixel, allowing it to capture motion hundreds of times faster than ordinary cameras and function even in low light.
However, the sparsity and noise of event streams pose significant challenges to accurate depth prediction. That’s where URNet (Uncertainty-aware Refinement Network) comes in. Developed by researchers at the Technical University of Munich, URNet transforms these rapid, flickering signals into accurate 3D “depth maps,” which are essentially digital distance maps showing how far away every object is from the vehicle.
URNet’s core innovation lies in how it processes information through local-global refinement and uncertainty-aware learning. First, URNet focuses on local refinement—using convolutional layers to recover fine-grained details such as the edges of cars, road markings, or pedestrians from sparse event signals. Then, in the global refinement stage, the model applies a lightweight attention mechanism to capture the broader structure of the scene, ensuring that local predictions are consistent with the overall environment. This strategy allows the network to understand both precise textures and the big picture of the driving scene. At the same time, URNet incorporates uncertainty-aware learning, meaning it not only predicts depth but also estimates how reliable each prediction is. For every pixel, the network produces a confidence score that reflects its certainty. When confidence is low—such as during glare, rain, or strong shadows—the system automatically adjusts its response, for example by slowing down, using other sensors, or prioritizing safer decisions. This built-in self-assessment makes the model more robust and trustworthy in unpredictable real-world conditions.
Experimental results on the DSEC dataset, one of the most comprehensive benchmarks for event-based stereo vision, show that URNet consistently produces clearer and more stable depth maps than state-of-the-art models, especially in fast motion or low-light scenarios, consistently achieving superior results across multiple metrics. The system also proved computationally efficient, achieving strong trade-offs between accuracy and runtime speed. Compared with leading baselines such as SE-CFF and SCSNet, URNet improved performance by a significant margin while keeping parameter counts low, making it suitable for practical deployment.
“Event cameras provide unprecedented temporal resolution, but harnessing their data for reliable depth estimation has been a major challenge,” said Dr. Hu Cao, one of the lead authors. “With URNet, we introduce uncertainty-aware refinement, giving depth prediction both precision and reliability.”
By combining high-speed event-based sensing with a confidence-aware learning mechanism, URNet represents a new step forward in intelligent perception for autonomous vehicles—enabling them to understand, evaluate, and react to the world around them with greater safety and reliability. The technology could significantly improve autonomous driving safety, particularly in challenging environments such as night driving, tunnels, or heavy rain. It could also enhance advanced driver-assistance systems (ADAS) and future vehicle perception platforms designed to handle unpredictable lighting and motion conditions.
Funding information
This work was supported by the MANNHEIMCeCaS (No. 16ME0820) program.
About the Authors
Prof. Dr.-Ing. habil. Alois Knoll, Chair of Robotics, Artificial Intelligence and Real-time Systems at the Technical University of Munich. His research interests include cognitive, medical and sensor-based robotics, multi-agent systems, data fusion, adaptive systems and multimedia information retrieval. He initiated and was the program chairman of the First IEEE/RAS Conference on Humanoid Robots (IEEE-RAS/RSJ Humanoids 2000), he was general chair of IEEE Humanoids 2003 and general chair of Robotik 2004, the largest German conference on robotics, and he served on several other organizing committees. Prof. Knoll is a member of the German Society for Computer Science and IEEE Fellow.
Dr. rer. nat. Hu Cao, Postdoctoral Researcher at the Chair of Robotics, Artificial Intelligence and Real-time Systems, Technical University of Munich. His research interests include autonomous driving, robotic grasping, dense prediction, and event-based vision.
About Visual Intelligence
Visual Intelligence is an international, peer-reviewed, open-access journal devoted to the theory and practice of visual intelligence. It is the official publication of the China Society of Image and Graphics (CSIG), with Article Processing Charges fully covered by the Society. The journal highlights innovative research in computer vision, artificial intelligence, and intelligent perception technologies that bridge science and engineering.
Journal
Visual Intelligence
Article Title
URNet: uncertainty-aware refinement network for event-based stereo depth estimation
Article Publication Date
2-Oct-2025