Feature Story | 11-Dec-2023

Software in science

California Institute of Technology

Computers dominate so many people's lives. Who doesn't log hours of screen time every day on their "phone"—a device that, yes, can initiate or receive telephone calls but mostly serves as a mini-portal to their email, texts, and a whole web of information and entertainment?

For the average person, a key feature of software is its invisibility. It chugs along in the background, rarely perceived (at least until it ceases to function as desired). For scientists, however, background is often foreground. It was scientists, after all, who first developed computing software to carry out routine tasks with an error-free speed previously only imagined.

Today, to a perhaps surprising degree, scientists are still writing their own code. Can we be content with this as a simple observation, or should we regard it as a cause for celebration or dismay?

For the Schmidt Academy at Caltech, now entering its fifth year, it's a bit of all of the above. As it is increasingly inefficient to ask scientists to write, maintain, and optimize the code they need for their research, scientists need translators: people who can understand science but whose greatest fluency is in programming languages.

The Schmidt Academy offers an ingenious intervention to address this dilemma: Recent computer science graduates are invited to Caltech, embedded in science labs for a year or two, and mentored; meanwhile, they bring software engineering best practices to their host research groups—hopefully inspiring all team members to write better code in the process.

Meet Howard Deshong, Schmidt Scholar, recent graduate of Harvey Mudd College, and happy resident in Assistant Professor of Physics Katerina Chatziioannou's lab, which analyzes data from the Laser Interferometer Gravitational-wave Observatory (LIGO). "The Schmidt Academy appealed to me because it catered to two of my greatest interests—science and software," Deshong says. "It promised to scratch both those itches and has lived up to that promise."

Since 2022, Deshong has been rewriting the code for BayesWave, a program used to process data from LIGO's detectors in Hanford, Washington, and Livingston, Louisiana. "These detectors change in length as gravitational waves pass through them, warping space-time as described by general relativity," Deshong explains. "The changes are incredibly small—the detectors are kilometers long, and they change in length by less than an atom's width under the influence of gravitational waves—so scientists and engineers have put a tremendous effort into cleaning up noise in the data."

BayesWave has been assembled over the past 10 years, almost entirely by LIGO scientists who are, of course, extremely familiar with how the detectors work, the potential sources of extraneous noise, and which patterns of gravitational waves indicate particular types of cosmic events. But the field of gravitational wave astrophysics is expanding rapidly, with more than 100 detections to date. The applications of the code have expanded as new use cases have been conceived and implemented by many LIGO scientists, with more advances yet to come. The load had become, in Chatziioannou's term, "unwieldy." And so she applied to the Schmidt Academy, hoping to bring on a software engineer who could "clean up the baseline stuff"— translating BayesWave from C, a programming language first developed in the 1970s, to C++, its successor—and enabling LIGO scientists to meet the needs of upcoming observations.

Deshong, Chatziioannou points out, "could have learned nothing and just simply taken the existing code and translated it and fixed things." But because Deshong attends lab seminars and can grasp the science that motivates the creation and expansion of the software, he is able to work with a variety of graduate students, postdocs, and other scientists to better serve their data-analysis needs. And if Deshong is overwhelmed by the software engineering required, he can go to his mentor, Donnie Pinkston (BS '98), instructor for the Schmidt Academy. "He's bursting with advice about software design," Deshong says. "He'll either have thoughtful answers at the ready or he'll know where to point me so I can learn more."

Schmidt Futures

The Schmidt Academy was funded in 2018 by Schmidt Futures, a philanthropic initiative founded by Eric and Wendy Schmidt. Mike Gurnis, the John E. and Hazel S. Smits Professor of Geophysics and director and leadership chair of Caltech's Seismological Laboratory, was recruited as director, and the Schmidt Academy was underway.

Four "Schmidt Scholars" came to Caltech in the summer of 2019 to begin work with Pinkston, a Caltech alumnus who has been teaching computer science at the Institute since 2005. With a projected 12 Schmidt Scholars joining the original four in 2020, Gurnis brought on Dave Rumph (BS '80), who had a prior career as software engineer at Xerox's Palo Alto Research Center, where he helped develop the first color laser-printer prototype, among other achievements.

It quickly became apparent that graduating Caltech computer science students would not be available to staff all of the Schmidt Academy positions. "A lot of the undergraduates at Caltech are ready to go someplace new when they graduate," says Pinkston, "and they're getting tempting industry offers."

Fortunately, the solution to the dilemma lay only 25 miles east at Harvey Mudd College, the science and engineering college of the Claremont College Consortium. Caltech alumna Katherine Breeden (BS '08), assistant professor of computer science at Harvey Mudd, joined the Schmidt Academy steering committee early on and has since been instrumental in recruiting recent Harvey Mudd graduates to the program at Caltech. Harvey Mudd's computer science graduates have abundant job opportunities in industry, but Caltech is new to them, as is the opportunity to work in such a rich variety of research labs. Both Caltech and Harvey Mudd grads now fill the ranks of each class of Schmidt Scholars.

Why Scientists Need Software Engineers

There is no disputing that software is important in the sciences today. As Tapio Schneider, the Theodore Y. Wu Professor of Environmental Science and Engineering and JPL Senior Research Scientist, points out, "In some areas, the entire scientific workflow can be automated. Materials design is one such area. Typically, you conceive of a material, synthesize or assemble it, and characterize and test its properties in a lab. Rinse and repeat. All of that used to be done manually, but now you can automate a lot of it, sometimes all of it, through AI [artificial intelligence] coupled with automated laboratories."

With processes this complex, scientists must write software—or at least have a deep understanding of what it can and cannot do well—and then successfully communicate about it to others in their lab. As Schneider says, "AI and computing are the new calculus. Similar to how calculus and statistics are fundamental components of the education of scientists and engineers, learning about AI principles and applications should be equally emphasized." Or as Gurnis puts it, "Everything we do in science is dependent upon really good software."

Certainly, scientists have a role to play in developing software engineering for their own projects, and, Gurnis says, "Often these advances are quite brilliant." But, predictably, there are limitations. Learning software engineering on top of scientific theories and methods is a big lift. "It's not something you can learn in a month," remarks Chatziioannou. "It's a massive skillset."

Beyond the question of skill is the fact that most labs at research universities such as Caltech have an intentionally high rate of turnover: One of their main goals is to train budding scientists, who are typically only in residence for three to five years. David Van Valen (PhD '11), assistant professor of biology and biological engineering and investigator at the Heritage Medical Research Institute, has hosted three Schmidt Scholars in his lab to date. He says one of the biggest roadblocks for scientists writing software is that "you have one science problem that requires analysis, so you do just enough to solve your one problem—and then the student in charge of that graduates. With each person, with each new problem, the wheel has to be reinvented."

Rumph paints the picture this way: "Grad students or postdocs get their data. They write it up, they publish it, they get their degree, and they leave. Then some new grad student comes in and wants to extend that work. The PI [principal investigator] says, "Oh, Fred was working on something like that. You should just pick up where he left off. I think it was version 3, or maybe 3A?' There's no documentation, and it doesn't even work anymore. The new student spends a month or two trying to figure out how this software was supposed to work. They tear their hair out, and eventually they say, 'Forget it, I'm going to start over.' And they make all the same mistakes over again as they write new software, and it costs them six months."

Even without changes in lab personnel, software crafted for a narrow purpose tends not to age well. "Software is not a stable thing," Pinkston explains. "It exists in an environment that's continually moving. The operating system changes, libraries get updated, hardware changes. And what happens is that the code that worked perfectly is useless two years later." Either the software stops working, Gurnis says, "or it is very sophisticated, but it can't be modified to do the latest and the greatest thing that the science needs it to do."

"If the software had been modular and acted more like a toolbox, if it was well documented and extensible and had tests in place that made sure it ran," Rumph says, "then they wouldn't have to waste that time." It is these "software engineering best practices," as they are known in the trade, that the Schmidt Academy teaches.

What the Schmidt Academy Offers to Young Software Engineers

In its first four years of operation, the Schmidt Academy has successfully recruited all of the scholars needed to fill available positions at Caltech, but this is not a trivial undertaking. "Software engineers have many, many opportunities in business and big tech and government. Everyone wants these people," Gurnis says. "What we want to do is funnel some of them, at least temporarily, through research universities."

Breeden concurs. "A lot of new positions are in areas like security and AI," she says. "Those folks are obviously excited to recruit our students, but there are so many other cool things that you can do with computing too." When recruiting for the Schmidt Academy, Breeden tells students, "If you make a tweak to Kindle software that rolls out to 10 million customers, you have a long lever that affects a ton of people a tiny amount. That's one way of measuring your impact. But with scientific software, you might be working on something more bespoke. Maybe there are only a hundred researchers in the world working in this specific area of, say, cosmology. You could be making a tool that completely transforms their workflow and has an enormous impact on the pace of scientific discovery or on the replicability of their work. It's amazing for somebody in their early 20s to be able to do that." Even more, Schneider says, "Schmidt Scholars are inventing the future of science. They are pioneering a way of doing science that will be common in a few decades."

Matching Scholars to Projects

The Schmidt Academy's steering committee, composed of Schmidt Academy personnel, select Caltech and Harvey Mudd faculty, and a physicist from the Carnegie Institution for Science, considers the matching of scholars and projects to be one of its most important tasks. A call goes out to faculty in November to find out who might be interested in hosting a Schmidt Scholar. Faculty must then apply for Schmidt Scholars by mid-December, and these applications are competitive.

Breeden explains that the committee takes several factors into consideration. For one, she says, "you have to have that kind of 'Goldilocks' project size. If it's too broad in scope, it might not be feasible for a scholar to complete the work in a year or two." Second, the lab must include appropriate liaisons: "Dave Rumph will go to the group, interview the PI, and figure out where the latent expertise lies in the lab, whether it's grad students, postdocs, or undergrads," she says. "He's looking for one or two people in the group who can be a bridge between the Schmidt Scholar and the group, because it's all about facilitating that communication." This is key, says Rumph, because "one of the challenges is how prepared the lab is for a software engineer as opposed to a new grad student." Finally, the nature of the project is a consideration. Says Breeden, "We're always looking for projects where it's like 'Wow! If we applied quality software engineering here, it would just be rocket boosters on what they're up to scientifically."

Selecting scholars is equally important to the success of the Schmidt Academy. One priority when reviewing applicants is ensuring that they have an adequate background in math and the sciences. "We need software engineers who understand physics, math, biology, chemistry, if that's what they're working on," says Schneider. "They're hard to hire." This is why the Schmidt Academy has so far included only graduates from Caltech and Harvey Mudd, since both colleges require all undergraduates to have a broad exposure to a full range of science disciplines. (There are plans afoot to bring Schmidt Scholars in from additional colleges if they have the necessary background.)

Only then does the matching begin. "The available projects are introduced to the scholars, and they let us know which projects they would like to join," explains Pinkston. Pinkston and Rumph each take on one project after the scholars have been matched as part of their ongoing contribution to the Schmidt Academy. This way both scholars and instructors are going through the same process at the same time, though with reference to different projects.

The Schmidt Academy on the Ground

When new Schmidt Scholars arrive on campus in late July or early August, they have already been assigned to labs and made a commitment for a year's work with the option to continue for a second year. (Most scholars spend two years at Caltech.) The first thing they do is meet their fellow Schmidt Scholars and their mentors, Pinkston and Rumph. This is done via a boot camp run by Pinkston—an intensive version of Caltech's CS 130, a software engineering course offered during winter term. Scholars practice designing and testing small-scale projects. Grad students can also take the class, for credit or not, and those grad students who will be working closely with a Schmidt Scholar are especially encouraged to do so.

David Pitt, who just graduated from Harvey Mudd, was excited to begin his Schmidt Scholarship in this past August's boot camp. "I worked full time my senior year in addition to school, and I got a taste of working in industry, so I decided I wanted to try something else for a year," he says. Pitt will be working on an AI project involving "a new breed of neural networks called physics-informed neural networks."

After boot camp is completed, Schmidt Scholars continue to meet every week, taking turns presenting progress on their projects. In addition, they meet individually with either Rumph or Pinkston every two weeks, or more often if needed. As an initial step, the scholars interview members of their lab to learn their software requirements. Then, says Breeden, "they build the scaffolding. They identify deliverables for the first year of work and map out a practical timeline."

Because the Schmidt Scholars are embedded in the labs that will be using the software they develop, "the information exchanges happen on a daily basis," Van Valen says. "Within six months, they know enough about the science to be able to really move forward on the things they've been tasked with."

Schmidt Scholars do encounter some challenges that computer science grads rarely confront in industry. "A lot of the scholars are doing 'software archaeology,'" Breeden explains. "They're looking at scientific software packages that might have been written 20 or 30 years ago in languages that we don't teach anymore. They have to become proficient in FORTRAN, for example, or they need to read through flight software from a satellite launched decades ago."

Schneider's CliMA lab has had four Schmidt Scholars to date, more than any other research group on campus. "We're building an Earth system model that consists of an atmosphere model and a land model, and an ocean model that is mostly being developed at MIT," he explains. These climate models are enormous. "They have millions of lines of code. Really, aside from nuclear codes, these are probably the most complex pieces of scientific software that exist. So, part of what we do is to try to reduce this complexity to what we really need."

Julia Sloan, who graduated from Caltech in 2022 with a degree in computer science, is beginning her second year as a Schmidt Scholar in the CliMA lab. "I'm working on the land model and on the coupler, which is a component of the software that links together the individual pieces of the model—atmosphere, land, and ocean. Once this climate computing project is completed, it will be used for fast and accurate predictions of the global climate for decades or even centuries into the future."

"CliMA was an amazing project to work on," says Ben Mackay, an alumnus of the Schmidt Academy. "It's a blend of software, mathematical, and physical problems all simultaneously being solved. It was an incredibly dynamic and collaborative workplace that I feel privileged to have been a part of. My time with the Schmidt Academy certainly confirmed my desire to pursue a PhD in climate science." Mackay is now a PhD student in climate science at the Scripps Institute of Oceanography at UC San Diego.

Opportunities are diverse within the Schmidt Academy and give scholars the chance to pursue multiple interests. "I majored in computer science, but I also took physics classes during my time at Harvey Mudd," says second-year Schmidt Scholar Alex Hadley. "I'm interested in software engineering as a career, but I didn't want to turn away from my interest in physics." Hadley is working on a project called Software Platforms for Quantum Experiments that is supervised by Oskar Painter, the John G Braun Professor of Applied Physics and Physics.

Schmidt Scholar Skylar Gering studied computer science and math at Harvey Mudd with an emphasis in environmental analysis. "I really wanted an opportunity to improve my software skills while working on a project that was meaningful to me," Gering says. "I interned with Pacific Northwest National Lab and the National Oceanic and Atmospheric Administration working on scientific software, but since Harvey Mudd is an undergraduate-only institution, I wanted to work in a graduate lab before deciding if I wanted to go to grad school myself."

Gering's project is focused on ocean–sea ice coupling. She explains, "We are seeing rapid decreases of sea ice globally. The biggest decreases are in the marginal ice zone around the edges of the pack. In those areas, the sea ice isn't one big sheet but rather lots of ice floes, which are distinct floating chunks of sea ice. We model these ice floes as polygons to better understand the dynamics in these areas."

The Contributions of the Schmidt Academy

Four years in, the Schmidt Academy is a success story, with more and more Caltech faculty eager to bring Schmidt Scholars into their labs. "We usually receive roughly twice as many proposals from faculty as we can support," Kaushik Bhattacharya, Caltech's vice provost and Howell N. Tyson, Sr., Professor of Mechanics and Materials Science, says.

With several classes of alumni now out in the world, benefits for the Schmidt Scholars are clearly visible. Some have gone on to graduate school in fields ranging from resource management to business to computational biology. Others have moved into industry, and some have been hired on as permanent software engineers at Caltech or other research universities.

Iman Wahle (BS '20) applied for graduate school at Princeton University directly out of college to study neuroscience. She was admitted but deferred for two years to be a Schmidt Scholar. "I learned so much about software development from the academy while simultaneously learning more about computational methods for analyzing high-dimensional neural data," she says.

The ambition of the Schmidt Academy is to transform the relationship between science and software. Over and over again, it has succeeded. "When people ask me about it," says Van Valen, "I say it lets you do a very different kind of science because the software engineering skill set is very enabling." The Van Valen lab has asked its Schmidt Scholars to train AI to recognize different types of cells in microscope images. "The AI is on a par with what humans are able to do," Van Valen says. The software developed by these Schmidt Scholars, called DeepCell, is available for public use.

Professor of Philosophy Frederick Eberhardt has also overseen the development of software that can be used by others in the field outside Caltech. He has worked with two Schmidt Scholars on the Causal Feature Learning project. "Together with Krysztof Chalupka [PhD '17], a former Caltech computer science student, and Pietro Perona [Allen E. Puckett Professor of Electrical Engineering], I had developed a proof-of-concept code in 2014–16," Eberhardt says. "The Schmidt Scholars then took this code draft and turned it into a Python package that includes proper documentation and tutorials, and is designed in a modular way so that researchers can easily adapt the code for their specific settings."

The Schmidt Academy has become a living model of how good software can transform and accelerate science. It is not the only effort underway to engage software engineers in scientific research in university settings, but it is one of the first, say Rumph and Gurnis, and it is distinctive in the way it benefits both researchers and emerging software engineers. "It's a really unique experiment that Caltech and Schmidt Futures has launched," says Breeden. "I don't know of any other university that's running a program like this, and Caltech has been doing it for four years."

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.