It is rocket science
National Center for Supercomputing Applications
NCSA has a history of working towards efficiency and sustainability in computational research. A significant collaborative effort headed up by researchers from the Georgia Institute of Technology (Georgia Tech), NVIDIA, Oak Ridge National Laboratory (ORNL), Advanced Micro Devices, Hewlett Packard Enterprise (HPE) and the Courant Institute of Mathematical Sciences at New York University (CIMS) truly showcases how giant supercomputers help researchers save enormous amounts of resources in the long run.
The project, which simulates a multi-engine spacecraft, has been selected as a finalist for the Association for Computing Machinery 2025 Gordon Bell Prize. These types of simulations require an extremely powerful supercomputer to execute, and getting time on those resources is special in and of itself. The largest supercomputers in the world are typically dedicated to a select number of research projects, and the specialists who operate these supercomputers want to ensure that the time spent on them is put to good use.
In this case, the research team aimed to simulate the compressible fluid flows of a spacecraft with a large number of rocket engines, similar to those currently being tested by SpaceX. This simulation is required to understand a new rocket science problem associated with many-engine configurations: base heating. Such heating occurs when the hot exhaust from multiple clustered engines is reflected back and towards the rocket's tail, which can be catastrophic. The team's work enables engineers to test complex engine layouts, such as the 33-engine Super Heavy booster that inspired the research, thereby preventing mission failures before building the first prototype.
This means that the simulations can start predicting booster behavior without having to manufacture every new design. Therefore, changes to design can be made more quickly, shortening the research and development time of most ventures. NCSA has been helping researchers create efficient designs through robust simulation for years, but the benefits in resource savings for something as momentous as a spacecraft launch are readily apparent to even casual observation — when a computer simulation fails, it doesn’t destroy all the hard work of building a craft. Instead, it starts all over in seconds, altering a variable to rerun the test.
Spencer Bryngelson, an assistant professor and researcher at the School of Computational Science & Engineering at Georgia Tech, leads the research team. His group and its collaborators are attempting to refine the simulation for maximum efficiency. The team's breakthrough centers on a new, optimized mathematical technique. Traditional simulations are forced to “blur” the shock waves caused by high-speed air, which destroys the tiny, important details, like how air turbulence or shock waves behave. Their new method was first developed by co-lead Florian Schäfer, assistant professor at CIMS, and is called Information Geometric Regularization (IGR); it’s akin to swapping that “blur” for a clear lens. It allows them to model the physics realistically without compromising the methods used to solve the fluid flow equations. This is the key reason they could scale their work so dramatically.
It’s challenging to secure time on some of the world's largest supercomputers, so ensuring a simulation design is in optimal shape for utilizing that time is crucial. To prepare for work on larger supercomputers, Bryngelson worked on NCSA’s GPU-based supercomputer, DeltaAI, to adjust the simulation.
“We used DeltaAI mostly for tuning our algorithm,” he said. “We’d run on one or a couple of nodes to see the simulation evolve. And then we would take that algorithm to one of the other large computers we had access to, and we could start the process there.”
To get time on DeltaAI, Bryngelson turned to the U.S. National Science Foundation ACCESS program. While DeltaAI isn’t as large as some of the supercomputers Bryngelson had access to, its NVIDIA GH200 Grace Hopper Superchips made it an ideal testing space for Bryngelson’s work.
“Our team has run simulations on many different computers. In this case, it was Jupiter, Europe’s new fastest supercomputer located in Germany, and CSCS, Alps in Switzerland, which both use NVIDIA Grace Hopper architecture. These are the biggest NVIDIA computers in the world that are available for open science research,” Bryngelson explained. “The use of Grace Hopper chips is very important. A big part of the reason that we were able to do these very large simulations was due to the time we spent refining them on NCSA and sister machines.”
The architecture of the NCSA machines Bryngelson worked on is another reason he chose DeltaAI. The Alps supercomputer, in particular, uses HPE/Cray technology in addition to the Grace Hopper chips, which made DeltaAI a strong testbed for his research. “The Delta computers are made by HPE Cray. HPE Cray is very good at power monitoring. They're very good about setting up really nice software environments that are consistent. And it just so happens that many of the other giant supercomputers in Europe we were working on are also Cray machines. That makes it much more straightforward to perform power monitoring, for instance, by determining how much power is being used by the different compute resources we’re using. This information is essential for our work because we’re attempting to create a more efficient algorithm. We want to know if it runs at the same speed but uses half the energy. It turns out, it does better.”
Using the ACCESS program also brought other incentives for Bryngelson’s team. The team of researchers is international, and that often requires a lot more administrative overhead to get the team fully on a resource. However, the ACCESS program offers the flexibility to enable a large collaborative effort that includes international researchers, allowing them to succeed.
“Having access to an open science system for this research has made it easy for our team to get results. Obtaining an allocation on ACCESS and then transitioning to a new machine, such as DeltaAI, was straightforward. We avoid numerous hoops to get new team members set up on the system. Being able to simply say, 'I have an allocation here; I've checked the box,' now means my students can use it, which is extremely valuable.”
This type of research may require a significant upfront investment in computing power, but the long-term benefits extend far beyond launching a large ship into space, which is already a considerable accomplishment. Bryngelson’s work, creating an extremely efficient fluid dynamic model, can be applied to fluid dynamics simulations in a broad array of applications. His team’s simulation is confirmed as the largest-ever fluid dynamics calculation, running on a grid with an incredible 200 trillion points – a scale at least 20 times larger than anything done before. Importantly, this massive scale didn't come with a large cost. Their approach ran four times faster and achieved up to a 5.4-fold reduction (or a 540% improvement) in getting the final answer compared to the best previous methods. This incredible saving in both time and energy is what truly defines “sustainability” in supercomputing.
“In the world of multiphase flow, there are so many applications for this research,” said Bryngelson. “Almost all flows have some turbulent features, of engineering interest or not. Contending with those scales is costly and has been a topic of scientific investigation since the very first supercomputers. Well, it really started well before then! Many things, from different types of nuclear reactors to blood flow research, could potentially benefit from this. The code that we write is open source, so I welcome anyone who wants to build on our work.”
ABOUT DELTA AND DELTAAI
NCSA’s Delta and DeltaAI are part of the national cyberinfrastructure ecosystem through the U.S. National Science FoundationACCESS program. Delta (OAC 2005572) is a powerful computing and data-analysis resource combining next-generation processor architectures and NVIDIA graphics processors with forward-looking user interfaces and file systems. The Delta project partners with the Science Gateways Community Institute to empower broad communities of researchers to easily access Delta and with the University of Illinois Division of Disability Resources & Educational Services and the School of Information Sciences to explore and reduce barriers to access. DeltaAI (OAC 2320345) maximizes the output of artificial intelligence and machine learning (AI/ML) research. Tripling NCSA’s AI-focused computing capacity and greatly expanding the capacity available within ACCESS, DeltaAI enables researchers to address the world’s most challenging problems by accelerating complex AI/ML and high-performance computing applications running terabytes of data. Additional funding for DeltaAI comes from the State of Illinois.
Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.
