News Release

New system combines smartphone videos to create 4D visualizations

Carnegie Mellon approach requires neither studio nor specialized cameras

Peer-Reviewed Publication

Carnegie Mellon University

Creating a Virtual Camera

image: By combining video of the same scene from several cameras, Carnegie Mellon University researchers can create a "virtual camera," that enables users to view the scene from various angles, or to remove people from the scene. view more 

Credit: Carnegie Mellon University

PITTSBURGH--Researchers at Carnegie Mellon University have demonstrated that they can combine iPhone videos shot "in the wild" by separate cameras to create 4D visualizations that allow viewers to watch action from various angles, or even erase people or objects that temporarily block sight lines.

Imagine a visualization of a wedding reception, where dancers can be seen from as many angles as there were cameras, and the tipsy guest who walked in front of the bridal party is nowhere to be seen.

The videos can be shot independently from variety of vantage points, as might occur at a wedding or birthday celebration, said Aayush Bansal, a Ph.D. student in CMU's Robotics Institute. It also is possible to record actors in one setting and then insert them into another, he added.

"We are only limited by the number of cameras," Bansal said, with no upper limit on how many video feeds can be used.

Bansal and his colleagues presented their 4D visualization method at the Computer Vision and Pattern Recognition virtual conference last month.

"Virtualized reality" is nothing new, but in the past it has been restricted to studio setups, such as CMU's Panoptic Studio, which boasts more than 500 video cameras embedded in its geodesic walls. Fusing visual information of real-world scenes shot from multiple, independent, handheld cameras into a single comprehensive model that can reconstruct a dynamic 3D scene simply hasn't been possible.

Bansal and his colleagues worked around that limitation by using convolutional neural nets (CNNs), a type of deep learning program that has proven adept at analyzing visual data. They found that scene-specific CNNs could be used to compose different parts of the scene.

The CMU researchers demonstrated their method using up to 15 iPhones to capture a variety of scenes -- dances, martial arts demonstrations and even flamingos at the National Aviary in Pittsburgh.

"The point of using iPhones was to show that anyone can use this system," Bansal said. "The world is our studio."

The method also unlocks a host of potential applications in the movie industry and consumer devices, particularly as the popularity of virtual reality headsets continues to grow.

Though the method doesn't necessarily capture scenes in full 3D detail, the system can limit playback angles so incompletely reconstructed areas are not visible and the illusion of 3D imagery is not shattered.

###

In addition to Bansal, the research team included Robotics Institute faculty members Yaser Sheikh, Deva Ramanan and Srinivasa Narasimhan. The team also included Minh Vo, a former Ph.D. student who now works at Facebook Reality Lab. The National Science Foundation, Office of Naval Research and Qualcomm supported this research.

Video: https://www.youtube.com/watch?v=quovnDPwL1k&feature=youtu.be


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.