From the first stereoscope invented back in the Victoria time to the latest Oculus Quest 2, the advances in optics and electrical engineering are gradually blurring the boundary between virtuality and reality. Looking forward, CGH is believed to be the next revolutionary technology for virtual and augmented reality. Through the digital recording of a virtual or real object emitted wavefront, CGH can reproduce the object’s 3D appearance with a physical depth of field. Existing algorithms follow two strategies to compute such a hologram: physically simulation plus phase-only encoding or iterative phase retrieval to comply with commonly used phase-only spatial light modulators (SLMs). However, either method has its pros and cons. The former is fast, but the phase-only encoding requires manual tuning of filtering parameters to achieve the sharpest but artifacts-free imagery; the latter is slow due to the iterative process yet produces holograms end-to-end without human intervention.
Recently, convolutional neural networks trained using supervised learning and unsupervised learning have been adopted to accelerate the above two methods, respectively. Despite significant speed-up, the improved algorithms inherit their parent methods' relative pros and cons, limiting each algorithm to its particular use cases. In a new paper published in Light Science & Application, a team of scientists, led by Professor Wojciech Matusik, Ph.D. student Liang Shi from Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, and coworkers proposed an AI-rendering system that leverages supervised+unsupervised two-stage training to combine the merits of above methods. In this work, researchers first introduced layered depth image (LDI) as a learning-compatible, rendering-efficient, and memory-compact 3D representation input over an RGB-D image or a voxel grid. They showed that the use of LDI leads to a more faithful reproduction of occlusion boundaries and makes the hologram rendering more robust against real-world 3D inputs with imperfect depth maps. Using this new representation, the team created a large-scale 3D hologram dataset MIT-CGH-4K-V2 to offer unprecedented 3D hologram quality for training machine-learning-based hologram rendering system.
The trained AI renderer allows for an end-to-end synthesis of phase-only 3D holograms in real-time (60 Hz) on consumer desktops and interactively (5 Hz) on cellphones, making holographic rendering more accessible than ever and bringing low-power mobile real-time computation within reach of years. Through a user-in-the-loop calibration, the AI renderer also jointly corrects a user’s vision aberration, presenting an opportunity to eliminate the use of prescription glasses. The scientists summarize their contributions and implication of this study below:
“Holographic 3D displays provide differentiating interactive experiences from cell phones or stereoscopic augmented reality (AR) and virtual reality (VR) displays. Our work makes a step towards the end-to-end synthesis of 3D phase-only holograms. It is fully automatic, robust to rendered and misaligned real-world inputs, produces realistic depth boundaries, and corrects vision aberrations. Reusing the minimalistic CNN architecture in our previous work, it runs efficiently on both workstation and edge devices, promising real-time mobile performance in future-generation tethered and untethered AR/VR headsets and glasses.”
“Researchers will greatly benefit from the MIT-CGH-4K-V2 dataset and the open-sourced implementation for reproducing our results and developing new applications such as joint hologram synthesis and compression, holographic fluorescent microscopy, neural photostimulation, and many more. Through this work, we hope to draw the community's attention to the ongoing trend of physics-guided learning approaches for holographic and computational optics applications.”
Light Science & Applications