News Release

Evolution reveals a link between DNA and protein shape

Peer-Reviewed Publication

PLOS

Fifty years after the pioneering discovery that a protein's three-dimensional structure is determined solely by the sequence of its amino acids, an international team of researchers has taken a major step toward fulfilling a tantalizing promise: predicting the structure of a protein from its sequence alone. This advance could open a series of doors for previously intractable research into important biological processes and development of novel therapeutic drugs.

The team from Harvard Medical School (HMS), Politecnico di Torino / Human Genetics Foundation Torino (HuGeF) and Memorial Sloan-Kettering Cancer Center in New York (MSKCC) reported their results on Dec. 7 in the journal PLoS ONE.

In molecular biology and biomedical engineering, knowing the shape of protein molecules is key to understanding how they perform the work of life, the mechanisms of disease and drug design. Normally scientists determine the shape of protein molecules by expensive and complicated experiments, but for most proteins these experiments have not yet been done, leaving many crucial biological questions unanswered.

In principle, this problem could be solved by computing a protein's shape based simply on its sequence, which is relatively easily determined based on its DNA , but despite limited success for some smaller proteins, this challenge has remained essentially unsolved. The difficulty lies in the astronomically large number of possible shapes for each protein; without any shortcuts, it would take a supercomputer many years to explore all of these options and find the right one for even a small protein.

"Experimental structure determination has a hard time keeping up with the explosion in genetic sequence information," said Debora Marks, a mathematical biologist in the Department of Systems Biology at HMS, who worked closely with Lucy Colwell, a mathematician who recently moved from Harvard to Cambridge University. The two researchers collaborated with physicists Riccardo Zecchina and Andrea Pagnani in Torino in a team effort initiated by Marks and computational biologist Chris Sander of the Computational Biology Program at MSKCC, who had earlier attempted a similar solution to the problem when substantially fewer sequences were available.

The international team tested a bold premise: that evolution can provide a roadmap to how a protein folds. Their approach combined three key elements: evolutionary information accumulated over many millions of years; data from high-throughput genetic sequencing; and a key method from statistical physics, co-developed in the Torino group with Martin Weigt, who recently moved to the University of Paris.

"Collaboration was key," Sander said. "As with many important discoveries in science, no one could provide the answer in isolation."

Using the accumulated evolutionary information, in the form of the sequences of thousands of proteins grouped into families of proteins likely to have similar shapes, the team developed an algorithm to infer which parts of a protein interact to determine its shape. With these internal protein interactions in hand, the researchers implemented widely-used molecular simulation software developed by Axel Brunger at Stanford University to generate the atomic details of the protein shape.

Using this process, the team was for the first time able to compute remarkably accurate shapes from sequence information alone for a test set of 15 diverse proteins, with no protein size limit in sight, with unprecedented accuracy.

"Alone, none of the individual pieces are completely novel, but apparently nobody had put all of them together to predict 3D protein structure," Colwell said.

The researchers caution that their method does have some weaknesses. Experimental structures, when available, generally are more accurate in atomic detail, and the method works only when researchers have genetic data for large protein families – but advances in DNA sequencing have yielded a torrent of such data that is forecast to continue growing exponentially in the foreseeable future.

The next step, the researchers say, is to predict the structures of unsolved proteins currently being investigated experimentally, before exploring the large uncharted territory of currently unknown protein structures. "Synergy between computational prediction and experimental determination of structures is likely to yield increasingly valuable insight into the large universe of protein shapes that crucially determine their function and evolutionary dynamics," Sander said.

###

Citation: Marks DS, Colwell LJ, Sheridan R, Hopf TA, Pagnani A, et al. (2011) Protein 3D Structure Computed from Evolutionary Sequence Variation. PLoS ONE 6(12): e28766. doi:10.1371/journal.pone.0028766

Funding: CS and RS have support from the Dana Farber Cancer Institute-Memorial Sloan-Kettering Cancer Center Physical Sciences Oncology Center (NIH U54-CA143798). LC is supported by an Engineering and Physical Sciences Research Council fellowship (EP/H028064/1). TH has support from the German National Academic Foundation. RZ has support from European Community grant 267915. No other financial support was received for the research. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

This press release was written by Roger Alan Leo in HMS Communications office.

LINK TO THE FREELY AVAILABLE ARTICLE: http://www.plosone.org/article/info:doi/10.1371/journal.pone.0028766


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.