News Release

Exploring a novel approach for improving generative AI models

Study reinterprets Schrödinger bridge models to reduce overfitting and training costs in generative AI models

Peer-Reviewed Publication

Institute of Science Tokyo

Novel approach to advance generative AI models

image: 

The developed model modified Schrödinger bridge-type diffusion models to add noise to real data through the encoder and reconstructed samples through the decoder. It uses two objective functions, the prior loss and drift matching, to reduce computational cost and prevent overfitting.

view more 

Credit: Institute of Science Tokyo

A new framework for generative diffusion models was developed by researchers at Science Tokyo, significantly improving generative AI models. The method reinterpreted Schrödinger bridge models as variational autoencoders with infinitely many latent variables, reducing computational costs and preventing overfitting. By appropriately interrupting the training of the encoder, this approach enabled development of more efficient generative AI, with broad applicability beyond standard diffusion models.

Diffusion models are among the most widely used approaches in generative AI for creating images and audio. These models generate new data by gradually adding noise (noising) to real samples and then learning how to reverse that process (denoising) back into realistic data. A widely used version, the score-based model, achieves this by the diffusion process connecting the prior to the data with a sufficiently long-time interval. This method, however, has a limitation that when the data differs strongly from the prior, the time intervals of the noising and denoising processes become longer, which causes slowing down sample generation.

Now, a research team from Institute of Science Tokyo (Science Tokyo), Japan, has proposed a new framework for diffusion models that is faster and computationally less demanding. They achieved this by reinterpreting Schrödinger bridge (SB) models, a type of diffusion model, as variational autoencoders (VAEs).

The study was led by graduate student Mr. Kentaro Kaba and Professor Masayuki Ohzeki from the Department of Physics at Science Tokyo, in collaboration with Mr. Reo Shimizu (then a graduate student) and Associate Professor Yuki Sugiyama from the Graduate School of Information Sciences at Tohoku University, Japan. Their findings were published in Volume 7, Issue 3 of the Physical Review Research on September 3, 2025.

SB models offer greater flexibility than standard score-based models because they can connect any two probability distributions over a finite time using a stochastic differential equation (SDE). This supports more complex noising processes and higher-quality sample generation. The trade-off, however, is that SB models are mathematically complex and expensive to train.

The proposed method addresses this by reformulating SB models as VAEs with multiple latent variables. “The key insight lies in extending the number of latent variables from one to infinity, leveraging the data-processing inequality. This perspective enables us to interpret SB-type models within the framework of VAEs,” says Kaba.

In this setup, the encoder represents the forward process that maps real data onto a noisy latent space, while the decoder reverses the process to reconstruct realistic samples, and both processes are modeled as SDEs learned by neural networks.

The model employs a training objective with two components. The first is the prior loss, which ensures that the encoder correctly maps the data distribution to the prior distribution. The second is drift matching, which trains the decoder to mimic the dynamics of the reverse encoder process. Moreover, once the prior loss stabilizes, encoder training can be stopped early. This allows us to complete learning faster, reducing the risk of overfitting and preserving high accuracy in SB models.

“The objective function is composed of the prior loss and drift matching parts, which characterizes the training of neural networks in the encoder and the decoder, respectively. Together, they reduce the computational cost of training SB-type models. It was demonstrated that interrupting the training of the encoder, mitigated the challenge of overfitting,” explains Ohzeki.

This approach is flexible and can be applied to other probabilistic rule sets, even non-Markov processes, making it a broadly applicable training scheme.

 

***

About Institute of Science Tokyo (Science Tokyo)

Institute of Science Tokyo (Science Tokyo) was established on October 1, 2024, following the merger between Tokyo Medical and Dental University (TMDU) and Tokyo Institute of Technology (Tokyo Tech), with the mission of “Advancing science and human wellbeing to create value for and with society.”


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.