image: Dvir Samuel, a PhD student at Bar-Ilan University
Credit: Courtesy of Bar-Ilan University
Bar-Ilan University announced today that a team from its Department of Computer Science has developed a breakthrough in video processing that significantly simplifies the separation of foreground objects from their backgrounds, without the need for extensive training or optimization. The new method, called OmnimatteZero, was developed by Dr. Dvir Samuel and Prof. Gal Chechik, who also serves as a senior director of AI research at NVIDIA.
Recently presented at the SIGGRAPH Asia conference, the research addresses extracting objects or figures from their backgrounds while preserving complex elements such as fur, hair, foliage, shadows, reflections, smoke, or rippling water. Current video layer separation methods rely on artificial intelligence models that must be trained using millions of labeled examples or heavy optimization methods, both of which are resource-intensive and time-consuming. Bar-Ilan team's research demonstrates that equivalent results can be achieved with significantly reduced effort, computation, and cost.
"In video decomposition systems, the algorithm must identify the effects an object imposes on the scene, and then remove or extract it in a way that looks natural," explained Dr. Dvir Samuel, who led this research as a doctoral student under Prof. Chechik's supervision. "Until now, every method required millions of examples to train a learning model, as well as very large computational power and energy. Even once the model was fully trained and ready to use, running it to achieve the desired result could still take several minutes for just a few seconds of video."
The approach functions as a "visual composting system" that enables content recycling. For example, a swan in a lake can be extracted complete with its reflection and seamlessly placed in a different pool, while the lake itself, minus the swan, can be reused as a background for another scene with natural-looking results including consistent reflections, shadows, and movement.
Unlike existing methods, OmnimatteZero avoids expensive supervised learning or self-supervised optimization. Instead, it leverages image-completion techniques traditionally applied to static images, enhanced with modules that track changes across time and space to maintain reconstructed background consistency. The researchers demonstrated that objects and their traces can be identified without training through a built-in self-attention mechanism that links regions across and within video frames.
The team’s research demonstrates that dedicated models are not necessary for video layer separation tasks, nor is exceptionally high computing power required. The method requires only an existing video generation model (such as WAN or VEO3) applied to this specific task. The study shows how current video-generation models can be utilized to detect object-created effects and remove, extract, and reinsert those objects and their effects into other videos in real time, eliminating the typical multi-minute processing wait.
The approach targets video editors and designers, content creators, advertisers, and AI researchers. The study's proof of feasibility suggests potential future accessibility for everyday use, including editing videos recorded on smartphones. Several university teams worldwide are currently working to improve OmnimatteZero.
Dr. Samuel's next research direction will address sound synchronization. "For instance, if there's a barking dog in the video and we remove the dog, we don't want to keep hearing the barking in the background that remains without it," he explained.
The project was conducted in collaboration with researchers from the Hebrew University and the OriginAI Research Center, Israel.
About the Research For more information and examples, visit: OmnimatteZero Website