Daniel Bodony's love of science began with a love of airplanes. He worked for one of his dad's colleagues on the weekends who had an airplane. "I would mow his grass and he would let me fly," Bodony remembers fondly.
Those early boyhood days fueled the fire for Bodony as he committed himself to a career as a military pilot but, at that time, a pilot who wore glasses was not allowed. Taking that in stride, Bodony decided he would instead design airplanes, only to experience another shift in his early career.
"When I got to college and started to design airplanes I realized that I liked the science behind the design more than I liked the design itself," he said.
Bodony, the Blue Waters Associate Professor in Aerospace Engineering at the University of Illinois at Urbana-Champaign (UIUC), is looking into the science surrounding the aeroacoustics of jet engines and researching how to make them quieter.
A veteran user of NSF high performance computing (HPC) resources since 2008, Bodony says: "The reason we use supercomputers is because in aeroacoustics there is no simple relationship that relates an unsteady flow field to the sound it creates. So we have to resort to elaborate experiments or simulations to try to come up with the contextual underpinnings that relate cause and effect. And we still haven't done it. The fact that aircraft has gotten quieter over the years is more by accident than by design, and we're trying to change that, but it relies on bigger calculations, bigger codes, and more complex computing capabilities."
The computational challenges that Bodony and his team face invariably involve turbulence, which is an unsteady, chaotic motion of a fluid. In the practicalities of calculating a turbulent flow, a researcher has two options: 1) make many assumptions and have a small computational model or 2) make few assumptions and have a very large computational model. Because the researchers don't yet understand sound generation at a fundamental level, they have to resolve all of the scales of motion involved in the turbulent flow.
"It's a classical multi-scale problem," Bodony says. "Computational research is required to resolve all of those scales which requires us to use the largest computers to which we have access - XSEDE's Stampede being one of them." The NSF Extreme Science and Engineering Discovery Environment (XSEDE) is the most advanced, powerful, and robust collection of integrated advanced digital resources and services in the world. It is a single virtual system that scientists can use to interactively share computing resources, data, and expertise.
XSEDE'S Extended Collaborative Support Service (ECSS)
Through XSEDE's Extended Collaborative Support Service (ECSS) program, researchers have access to cyberinfrastructure experts with a variety of expertise. ECSS experts, many with advanced degrees in domain areas, are available for collaborations lasting months to a year to help researchers fundamentally advance their use of XSEDE resources.
Bodony has used the ECSS program for a variety of projects. When asked if he would recommend the ECSS program to other researchers, his response was a quick, "Yes, wholeheartedly. The ECSS experts are able to look at the code and understand the hardware and software very quickly to make a diagnosis."
Currently, through XSEDE's ECSS program, Bodony works primarily with Luke Wilson from the Texas Advanced Computing Center, one of the top advanced computing centers for open science in the nation. Wilson, who serves as the technical expert -- his expert knowledge about the hardware, and how the software interacts with the hardware -- is helping Bodony and his team achieve real performance results on their code.
Three categories of ECSS support exist for projects: Extended Support for Research Teams; Extended Support for Community codes; and Extended Support for Gateways. Bodony is in the Research Teams category, with the single investigator code known as PlasComCM, a multi-physics solver that can solve for the motion of a compressible viscous fluid with a compressible, finite strain solid.
"When we run our code we have a basic idea of what its weaknesses are, and we try to identify the biggest weakness that impacts our ability to run efficiently on XSEDE systems, including being able to utilize Stampede's Intel® Xeon Phi™ processors," Bodony says.
According to Wilson, "the goal has always been to get this code up and running on the Intel® Xeon Phi™, and we started out looking for some simple places we could target to improve performance, mostly through vectorization. We found that the data encoding and the original algorithm weren't well suited to the Xeon Phi...there was a lot of out of order memory access, which you can't vectorize very easily."
At first, Wilson executed a simple performance analysis of the code and identified the algorithmic weaknesses to find better ways to express the algorithms. Sometimes the algorithms needed new data structures, sometimes an entirely new algorithm needed to be implemented to perform the same operation, and sometimes the researchers needed to rewrite part of the code that made that algorithm no longer necessary.
It was a team effort among Bodony, Wilson and several people at UIUC, but Wilson was instrumental in taking the cumulative view of the code and speeding it up by a factor of seven. How? He figured out where the performance bottlenecks were. With too many memory loads for every floating point operation, the code had to copy data out of memory, add and multiply, and then store it back into memory.
"It took a long time to figure out that some of the constructs we were using in Fortran were causing unnecessary memory loads, and that was a big shock to us," Bodony said. "We thought the compilers were supposed to do this automatically." Then, they found that by reordering some of the add and multiplies they were able to get better cache utilization and better vectorization. When they achieved multiple adds and multiplies done concurrently, it brought them closer to 100 percent of Stampede's theoretical peak performance.
"We still have a long way to go with jet noise and we're going to continue to follow jet noise for the foreseeable future," Bodony says. "We think the flow that exits the jet engines contains the information that we need to figure out how to make jet engines quieter. We just haven't probed it in the right way. We've been working on tools to extract that information. Our current hypothesis has shown that our idea has merit using small scale simulations and now we're applying these ideas to the full scale jet noise problem."
What's next for Bodony and team?
According to Bodony, future computers are going to look a lot different than Stampede or any of the other NSF-funded systems. Now, they're not using ECSS to focus on performance; they're working with ECSS to change how the code is programmed.
As part of the exascale applications group, Bodony and his team are focused on building scalable algorithms. "How you program on future machines is going to be very different from how we program for Stampede," Bodony said. "What that means is that the codes that we have now may not run on future machines. We're trying to rewrite the code in such a way that it's ready for those future machines. Luke and I are working together to figure out how to fix our current code and transform it into one that's useful at exascale."
"Most people think that Knight's Landing (a second generation Xeon Phi product using a 14 nm process) is a preview of what processors will look like going forward as we push toward exascale -- lots of concurrency, many cores in a single package, the memory footprint per thread is going to be very small -- so we will completely rethink the way we solve our problems. It's safe to say that every time a new processor comes out it's a completely new challenge," Wilson concluded.
Dr. Dan Bodony's research is funded by the Office of Naval Research. XSEDE and TACC's Stampede supercomputer are funded by the National Science Foundation.