image: The developed model modified Schrödinger bridge-type diffusion models to add noise to real data through the encoder and reconstructed samples through the decoder. It uses two objective functions, the prior loss and drift matching, to reduce computational cost and prevent overfitting.
Credit: Institute of Science Tokyo
A new framework for generative diffusion models was developed by researchers at Science Tokyo, significantly improving generative AI models. The method reinterpreted Schrödinger bridge models as variational autoencoders with infinitely many latent variables, reducing computational costs and preventing overfitting. By appropriately interrupting the training of the encoder, this approach enabled development of more efficient generative AI, with broad applicability beyond standard diffusion models.
Diffusion models are among the most widely used approaches in generative AI for creating images and audio. These models generate new data by gradually adding noise (noising) to real samples and then learning how to reverse that process (denoising) back into realistic data. A widely used version, the score-based model, achieves this by the diffusion process connecting the prior to the data with a sufficiently long-time interval. This method, however, has a limitation that when the data differs strongly from the prior, the time intervals of the noising and denoising processes become longer, which causes slowing down sample generation.
Now, a research team from Institute of Science Tokyo (Science Tokyo), Japan, has proposed a new framework for diffusion models that is faster and computationally less demanding. They achieved this by reinterpreting Schrödinger bridge (SB) models, a type of diffusion model, as variational autoencoders (VAEs).
The study was led by graduate student Mr. Kentaro Kaba and Professor Masayuki Ohzeki from the Department of Physics at Science Tokyo, in collaboration with Mr. Reo Shimizu (then a graduate student) and Associate Professor Yuki Sugiyama from the Graduate School of Information Sciences at Tohoku University, Japan. Their findings were published in Volume 7, Issue 3 of the Physical Review Research on September 3, 2025.
SB models offer greater flexibility than standard score-based models because they can connect any two probability distributions over a finite time using a stochastic differential equation (SDE). This supports more complex noising processes and higher-quality sample generation. The trade-off, however, is that SB models are mathematically complex and expensive to train.
The proposed method addresses this by reformulating SB models as VAEs with multiple latent variables. “The key insight lies in extending the number of latent variables from one to infinity, leveraging the data-processing inequality. This perspective enables us to interpret SB-type models within the framework of VAEs,” says Kaba.
In this setup, the encoder represents the forward process that maps real data onto a noisy latent space, while the decoder reverses the process to reconstruct realistic samples, and both processes are modeled as SDEs learned by neural networks.
The model employs a training objective with two components. The first is the prior loss, which ensures that the encoder correctly maps the data distribution to the prior distribution. The second is drift matching, which trains the decoder to mimic the dynamics of the reverse encoder process. Moreover, once the prior loss stabilizes, encoder training can be stopped early. This allows us to complete learning faster, reducing the risk of overfitting and preserving high accuracy in SB models.
“The objective function is composed of the prior loss and drift matching parts, which characterizes the training of neural networks in the encoder and the decoder, respectively. Together, they reduce the computational cost of training SB-type models. It was demonstrated that interrupting the training of the encoder, mitigated the challenge of overfitting,” explains Ohzeki.
This approach is flexible and can be applied to other probabilistic rule sets, even non-Markov processes, making it a broadly applicable training scheme.
***
About Institute of Science Tokyo (Science Tokyo)
Institute of Science Tokyo (Science Tokyo) was established on October 1, 2024, following the merger between Tokyo Medical and Dental University (TMDU) and Tokyo Institute of Technology (Tokyo Tech), with the mission of “Advancing science and human wellbeing to create value for and with society.”
Journal
Physical Review Research
Method of Research
Computational simulation/modeling
Subject of Research
Not applicable
Article Title
Schrödinger bridge-type diffusion models as an extension of variational autoencoders
Article Publication Date
3-Sep-2025
COI Statement
The authors declare no conflicts of interest regarding this manuscript