BELLINGHAM, Washington, USA -- Those formerly silent walls can "talk" now: Researchers have demonstrated a simple optical technique by which audio information can be extracted from high-speed video recordings. The method uses an image-matching process based on vibration from sound waves, and is reported in an article appearing in the November issue of the journal Optical Engineering, published by SPIE, the international society for optics and photonics.
"One of the intriguing aspects of the paper is the ability to recover spoken words from a video of objects in the room," said journal Associate Editor Reiner Eschbach, a Research Fellow at Xerox Corp. "The paper shows that the sound creates minute vibrations in objects and that these vibrations ― given the right equipment ― can be picked up from a video signal. This is an interesting foray into a new application space and will, in my view, trigger interesting research in the field,"
The article, "Audio extraction from silent high-speed video using an optical technique," was authored by Zhaoyang Wang, Hieu Nguyen, and Jason Quisberth of the Department of Engineering of the Catholic University of America, and is available from the SPIE Digital Library.
The technique is based on the fact that sound waves are mechanical waves that cause air to vibrate when traveling, the paper notes. That vibration through air can cause vibration of objects located in its traveling path, especially if the objects are lightweight, thin, and flexible, such as a piece of paper. The vibrations, although usually with small amplitudes, can be detected and analyzed algorithmically, and audio reconstructed based on those calculations.
The authors used a subset-based image-correlation approach to detect the motions of points on the surface of an object, capturing target images with a high-speed camera and applying the Gauss-Newton algorithm and a few other measures to achieve very fast and highly accurate image matching. Because the detected vibrations are directly related to sound waves, a simple model was used to reconstruct the original audio information of the sound waves.
While other recent work in the area reports on more sophisticated techniques to compute motion signals, the authors chose a simpler image-matching approach to measure vibration. Because light can travel through air considerably farther than sound and can pass through glass, they anticipate that the technique may find applications such as the passive detection of conversations inside of a building from a far distance, Wang said. "We are currently improving the technique to increase its accuracy and sensitivity, make the measurements in real-time, and remove interference from other sources."
Optical Engineering is published in print and digitally in the SPIE Digital Library, which contains more than 420,000 articles from SPIE journals and proceedings, as well as more than 200 eBooks. Abstracts are freely searchable, and an increasing number of full articles in the society's 10 peer-reviewed journals are published with open access. Approximately 18,000 new research papers, eBooks, and other publications are added each year.
Michael Eismann, Senior Scientist for Electro-Optical and Infrared Sensors at the Sensors Directorate of the U.S. Air Force Research Lab, is the journal's Editor-in-Chief.
SPIE is the international society for optics and photonics, a not-for-profit organization founded in 1955 to advance light-based technologies. The Society serves nearly 256,000 constituents from approximately 155 countries, offering conferences, continuing education, books, journals, and a digital library in support of interdisciplinary information exchange, professional networking, and patent precedent. SPIE provided more than $3.2 million in support of education and outreach programs in 2013.