Now scientists at the U.S. Department of Energy's Brookhaven National Laboratory have written a computer program "to sort the informational 'wheat' from the 'chaff,'" said Brookhaven biochemist John Shanklin, who leads the research team. The program, which is described in the open access journal BMC Bioinformatics*, makes comparisons of groups of related proteins and flags individual amino acid positions that are likely to control function.
Biochemists are interested in identifying "active sites" -- regions of proteins that determine their functions -- and learning how these sites differ between paralogs, proteins that have different functions that arose from a common ancestor. The new program, called CPDL for "conserved property difference locator," identifies positions where two related groups of proteins differ either in amino acid identity or in a property such as charge or polarity.
"Experience tells us that such positions are likely to be biologically important for defining the specific functions of the two protein classes," Shanklin said.
When the Brookhaven team used the program to scan three test cases, each consisting of two groups of related but functionally different enzymes, the program consistently identified positions near enzyme active sites that had been previously predicted from structural and or biochemical studies to be important for the enzymes' specificity and/or function. "This suggests that CPDL will have broad utility for identifying amino acid residues likely to play a role in distinguishing protein classes," Shanklin said.
Scientists have already used such comparative sequence analysis to identify protein active sites, and have also used this knowledge to alter enzyme functions by switching particular amino acid residues from one class of enzyme to turn it into the related but functionally different class. But comparing sequences "manually" is labor intensive, error prone, and has become impractical for those who wish to take advantage of the increasing number of sequences in protein databases, Shanklin said.
"Yet this growing data resource contains a wealth of information for structure-function studies and for protein engineering," Shanklin said. "We developed CPDL as a general tool for extracting and displaying relevant functional information from such data sets."
Also, since CPDL does not require that a protein's structure be known -- just its amino acid sequence -- it can be applied to studies of proteins that reside in the cell membrane, for which it is notoriously difficult to determine a molecular structure.
The research was funded by the Office of Basic Energy Sciences within the U.S. Department of Energy's Office of Science, and by a Goldhaber Fellowship. The team included Brookhaven Goldhaber fellow Kim Mayer and Bioinformaticist Sean McCorkle.
DOE's Office of Science was a founder of the Human Genome Project, a nationwide effort to generate the instrumentation and biological and computational resources necessary to sequence the entire human genome, identify all functional genes, and help transfer this information and related technology to the private sector for the benefit of society (see www.DOEgenomes.org). Studies of proteins, the "workhorses" that carry out the instructions of the genome, are a natural outgrowth of this work, with the potential to generate large returns of knowledge from this initial basic research investment.
One of ten national laboratories overseen and primarily funded by the Office of Science of the U.S. Department of Energy (DOE), Brookhaven National Laboratory conducts research in the physical, biomedical, and environmental sciences, as well as in energy technologies and national security. Brookhaven Lab also builds and operates major scientific facilities available to university, industry and government researchers. Brookhaven is operated and managed for DOE's Office of Science by Brookhaven Science Associates, a limited-liability company founded by Stony Brook University, the largest academic user of Laboratory facilities, and Battelle, a nonprofit, applied science and technology organization.
Linking Enzyme Sequence to Function Using Conserved Property Difference Locator to Identify and Annotate Positions Likely to Control Specific Functions; Kimberly M Mayer, Sean R McCorkle and John Shanklin; BMC Bioinformatics 2005, 6:280 (30 November 2005): http://www.