image: UCR computer scientist Mingxun Wang in his laboratory. Wang created the new programming language for scientists.
Credit: Stan Lim/UCR
Biologists and chemists have a new programming language to uncover previously unknown environmental pollutants at breakneck speed – without requiring them to code. By making it easier to search massive chemical datasets, the tool has already identified toxic compounds hidden in plain sight.
Mass spectrometry data is like a chemical fingerprint, showing scientists what molecules are in a sample such as air, water, or blood, and in what amounts. It helps identify everything from pollutants in water to chemicals in new medicines.
Developed at UC Riverside, Mass Query Language, or MassQL, functions like a search engine for mass spectrometry data, enabling researchers to find patterns that would otherwise require advanced programming skills. Technical details about the language, and an example of how it helped identify flame retardant chemicals in public waterways, are described in a new Nature Methods journal article.
“We wanted to give chemists and biologists, who are generally not also computer scientists, the ability to mine their data exactly how they want to, without having to spend months or years learning to code,” said Mingxun Wang, UCR assistant professor of computer science, who created the language.
Demonstrating the effectiveness of the language, Nina Zhao, a UCR postdoctoral student now at UC San Diego, used MassQL to sift through the entire world’s mass spectrometry data on water samples that has been made available to the public. She was looking for organophosphate esters, which are generally found in flame retardants.
“There are quite literally a billion measurements of molecules in this data. You cannot go through it manually,” said Wang. “However, the language acts like a filter, in a sense, for these chemicals, and it pulled out thousands of them.”
In addition to finding known chemicals in the water samples, they also found organophosphate compounds that have not been previously described or catalogued, and some chemicals that are the product of organophosphates breaking down over time.
“These chemicals can cause a lot of problems for human and animal health, and for entire ecosystems. They were designed to be flame retardants or plasticizers, but they can cause endocrine and sexual system disruptions, as well as cardiovascular problems,” Zhao said.
Before plans can be made for handling or removing toxic chemicals from our environment, scientists need to know what is present. That’s where MassQL comes in handy for scientists like Zhao.
“The language allows me to track everything that’s ever been detected in all data on air, soil, water, and even in the human body. Whatever exists, we can search for chemicals in there,” she said.
One of the challenges in creating MassQL was in getting a consensus of life scientists to agree on the definition of terms the software would use. “Both chemists and computer scientists have to understand it, and the software has to be able to operate on it,” Wang said.
For this reason, about 70 scientists consulted in the development phase. They all gave their feedback on the most important information terms and how to express it in the MassQL language.
The research team also wanted to demonstrate that the language could be useful in a variety of real-life situations. In addition to Zhao’s project, the paper details more than 30 applications in which MassQL could be applied.
Sample-use cases include the detection of fatty acids as markers of alcohol poisoning, looking for new drugs to solve the looming antibiotic resistance crisis, learning about the chemicals that bacteria use to communicate with one another, and finding forever chemicals on playgrounds.
In the past, Wang would get requests for software that could look for data patterns specific to all of these different kinds of applications.
“I thought I could do something to save myself time,” he said. “I wanted to create one language that could handle multiple kinds of queries. And now we have. I’m excited to hear about the discoveries that could come from this.”
Journal
Nature Methods
Article Title
A universal language for finding mass spectrometry data patterns
Article Publication Date
12-May-2025