Researchers have invented computational tools to decode and rapidly determine whether natural compounds collected in oceans and forests are new--or if these pharmaceutically promising compounds have already been described and are therefore not patentable.
This University of California, San Diego advance will finally enable scientists to rapidly characterize ring-shaped nonribosomal peptides (NRPs)--a class of natural compounds of intense interest due to their potential to yield or inspire new pharmaceuticals. The study will be published in the July 13 online issue of journal Nature Methods.
"These advances will speed the process by which we discover and describe new and biologically active molecules from organisms such as marine cyanobacteria, also known as blue-green algae. This, in turn, will accelerate the timeline for bringing new experimental therapies into clinical application," said William Gerwick, an author on the paper and a professor with the UC San Diego Scripps Institution of Oceanography Center for Marine Biotechnology and Biomedicine and the UCSD Skaggs School of Pharmacy and Pharmaceutical Sciences. (Read about Gerwick's work to discover drugs and protect Panama's natural and cultural resources at:
Nonribosomal peptides (NRPs) often serve as chemical defenses for the bacteria that manufacture them. Starting from penicillin, NRPs have an unparalleled track record in pharmacology: most anti-cancer and anti-microbial agents are natural products or their derivatives. However, it is currently difficult, time-consuming and costly to determine the molecular structure of NRPs which, by definition, are not directly inscribed in the genomes of the organisms that produce them.
"NRPs are one of the last bastions of pharmacologically important biological compounds that remain virtually untouched by computational research. As a result, it is currently one of the most painfully slow processes, it is a real bottleneck that we have now removed," said Pavel Pevzner, a computer science professor at UC San Diego's Jacobs School of Engineering and the corresponding author on the Nature Methods paper.
Researchers can now separate known compounds from those that are unknown.
"If I collect 1,000 ocean compounds, why waste time with compounds that are already known or patented?" added Nuno Bandeira, co-lead author on the paper, director of UC San Diego's Center for Computational Mass Spectrometry (CCMS) and a researcher at the UC San Diego division of Calit2, the California Institute of Telecommunications and Information Technology.
"Our algorithms can tell natural product researchers what their compounds are. Manual annotations should be something of the past," said Julio Ng, a co-lead author on the Nature Methods paper and a doctoral student in Bioinformatics at UC San Diego.
"Compound 879," for example, is a cyclic NRP discussed in the Nature Methods paper that was thought to be novel when it was isolated. A lengthy and expensive patenting process, however, uncovered that compound 879 had already been described as an antibiotic and named neoviridogrisen. The new UC San Diego algorithms would have quickly identified this fact. These algorithms make sense of the flood of tiny peptide fragments that are generated by machines called mass spectrometers that blast nonribosomal peptides apart and determine their sizes.
Two complementary processes are used to glean insights from data generated from the mass spectrometers that break the cyclic peptides into smaller and smaller linear pieces.
First, the authors present new algorithms that computers use to piece these peptide fragments back together in order to determine the chemical structure of a cyclic NRP. This is called "De Novo sequencing of NRPs."
Second, the researchers created "dereplication" tools for moving the other direction: taking the chemical structures of known NRPs and other related information and determining what the data signature would look like if a mass spectrometer had blown the compound part.
"Natural products have a long history in therapeutic development and many were discovered before the digital recording of mass spectrometry data. Therefore, we do not have an extensive mass spectrometry database for natural products as we do for proteomics. Our new tools enable dereplication without an experimental database to compare to," said Pieter Dorrestein, assistant professor in the UC San Diego Skaggs School of Pharmacy and Pharmaceutical Sciences and the Departments of Pharmacology, Chemistry and Biochemistry.
By using these two approaches, the researchers have created tools that enable researchers to both characterize the compound they have isolated and check to see if it, or something similar, has been previously described. With dereplication, researchers can leverage known information and are not forced to start from scratch each time a new compound needs to be identified.
"As long as the structure of the therapeutic or a related therapeutic or natural product is in the library, we can accurately dereplicate the molecule. This is the first generation of algorithms that can accomplish this and is a glimpse into the future of modern drug discovery."
Performing de novo sequencing without knowing amino acid masses is completely novel, according to Bandeira. "Until we created them, there were no algorithmic approaches available to do this from mass spectrometry data and it was generally thought to be impossible," said Bandeira, who earned his Ph.D. in computer science from the UC San Diego Jacobs School of Engineering.
The work allows mass spectrometry to go into the natural products field and actually do the identification and characterization of natural products in a high throughput fashion, explained Ng, a bioinformatics PhD student advised by Pavel Pevzner in computer science and Pieter Dorrestein in the Skaggs School of Pharmacy.
The researchers note that currently there is no one place to look for known NRPs, a situation they are trying to change with a new data repository effort.
The UC San Diego web-based tools for sequencing nonribosomal peptides (at not cost to researchers) are available at: bix.ucsd.edu/nrp
"This new study has shown that marine cyanobacteria are incredible sources of new molecules that may have medical value, especially in cancer, infectious diseases and neurological disorders," said Gerwick.
This project was supported by US National Institutes of Health grants 1-P41-RR024851-01, GM086283 and cA10u851, and by the PhRMA foundation.
"Dereplication and De Novo Sequencing of Nonribosomal Peptides," by Julio Ng,1,8 Nuno Bandeira,2,8 Wei-Ting Liu,3 Majid Ghassemian,3 Thomas L. Simmons,4 William Gerwick,4,5 Roger Linington,6 Pieter Dorrestein,3,5 and Pavel Pevzner2,7
1 Bioinformatics Program, University of California San Diego, La Jolla, California 92093
2 Department of Computer Science and Engineering, University of California San Diego, La Jolla, California 92093
3 Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California 92093
4 Scripps Institution of Oceanography, University of California San Diego, La Jolla, California 92037
5 Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92092
6 Department of Chemistry, University of California Santa Cruz, Santa Cruz, California 95064
7 Corresponding author: Pavel Pevzner firstname.lastname@example.org
8 Authors contributed equally
CCMS is a joint effort between the Computer Science and Engineering (CSE) department of the Jacobs School of Engineering, and the UCSD division of the California Institute for Telecommunications and Information Technology (Calit2).
Pavel Pevzner is also director of the Calit2-based Center for Algorithmic and Systems Biology (CASB) and is a Howard Hughes Medical Institute (HHMI) professor.