image: Dr. Tomáš Pluskal, head of the Biochemistry of Plant Specialized Metabolites group at IOCB Prague
Credit: Tomáš Belloň/IOCB Prague
Scientists from the laboratory of Dr. Tomáš Pluskal at IOCBPrague are helping colleagues around the world identify previously unknown compounds. They have created an extensive library called MSⁿLib, which contains several million records showing how small molecules “break apart” when measured by mass spectrometry. Until now, comparable databases have expanded only very slowly, but thanks to a new approach developed at IOCB Prague, data on unknown molecules can now be obtained in a matter of minutes. This opens the potential for faster drug discovery, better monitoring of chemical substances in the environment, and further advances in artificial intelligence for biomedicine. An article about the library has been published in the journal Nature Methods.
[Video: https://youtu.be/FDeIwm-BHls]
Mass spectrometry reveals the composition of chemical substances and is a key tool in medicine, pharmacy, and environmental research. The instrument breaks a compound into smaller parts, and from these fragments scientists determine the structure of the original molecule. Fragment spectra, which can be imagined as a fingerprint unique to each substance, are compared with already known spectra stored in libraries. However, existing databases have covered only a limited number of known compounds, making the search considerably more difficult.
Tomáš Pluskal and his team have moved the development of spectral libraries significantly forward. At the time they prepared their study for Nature Methods, they had compiled a catalog of thirty thousand small molecules. For these they recorded two million high-quality spectra, and they did not settle for a rough picture. Through multistage fragmentation (MSⁿ), i.e. repeated breaking of molecules, they obtained a more detailed view of their internal structure. Such a comprehensive data set is available to the scientific world for the first time. Tomáš Pluskal explains: “During the twenty years I’ve worked in this field, spectral libraries have not expanded much. We managed to change this practice and created the largest database currently in existence. Moreover, we’ve made it openly available to the global scientific community.”
The researchers also substantially accelerated the analysis itself. They can measure ten compounds at once, and the entire process takes only a minute and a half. Because Pluskal’s team is exceptionally well known and active in the global scientific community, they have received thousands of compounds as gifts from companies and institutions. “Since writing the article in Nature Methods, we’ve advanced further. So far, we’ve processed about 70,000 compounds, and we have another 150,000-awaiting analysis. We continue uploading data online, and by the end of the year we’d like to reach 200,000 measured compounds. That’s roughly ten times more than has been available over the past twenty years,” says the first author of the article, Dr. Corinna Brungs.
Tomáš Pluskal and his colleagues are also using the enormous amount of new data to improve AI algorithms that autonomously recognize unknown chemical substances – from metabolites in the human body to compounds in plants and microorganisms. Scientists “feed” the machine learning model with data from the chemical library. The more data it receives, the more accurately the model can predict, based on the supplied spectrum, what the molecule behind the spectrum might look like.
The spectral library was created using the open-source software mzmine, which enabled automated processing of a vast number of measurements. As a result, the resource is not only extensive but also easily usable for further scientific projects worldwide.
IOCB Prague / Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences (www.uochb.cz) is a leading internationally recognized scientific institution whose primary mission is the pursuit of basic research in chemical biology and medicinal chemistry, organic and materials chemistry, chemistry of natural substances, biochemistry and molecular biology, physical chemistry, theoretical chemistry, and analytical chemistry. An integral part of the IOCB Prague’s mission is the implementation of the results of basic research in practice. Emphasis on interdisciplinary research gives rise to a wide range of applications in medicine, pharmacy, and other fields.
Journal
Nature Methods
Article Title
MSnLib: efficient generation of open multi-stage fragmentation mass spectral libraries
Article Publication Date
15-Sep-2025