The technique, reported in the April 29 issue of the journal, was developed by researchers from the University of California, Berkeley, Harvard and Princeton universities, and the National Institutes of Health.
Genes that change slowly or not at all in an organism, or from one organism to another, usually turn out to be critical pieces of molecular machinery and, in an infectious organism, attractive targets for researchers hoping to kill it
Alternatively, genes that change rapidly are presumed to be under selective evolutionary pressure, such as the need for a microbe to continually switch its outer coat to escape detection by the human immune system. Such genes can tell researchers how organisms outwit the immune system or develop drug resistance.
This new technique is a total departure from current methods of finding rapidly evolving genes, and has already pinpointed previously unknown genes in the tuberculosis and malaria parasites that could be potential drug targets.
"In the typical comparative method, researchers take equivalent genes from several organisms, like humans and chimps and mice, line them up and count the differences," explained coauthor Hunter B. Fraser, a graduate student in molecular and cell biology at UC Berkeley. "That gives you an idea of what kinds of changes a gene has undergone over evolution, and from the kinds of changes you see, you can infer something about the way it is evolving - whether it has been pressured to change or pressured to stay the same.
"We're coming out with a similar end result - knowing what kinds of evolutionary pressures are on different genes - but we can do it with just a single genome sequence, instead of lining up genes from different genomes and comparing sequences."
Fraser works in the laboratory of Michael Eisen, a UC Berkeley adjunct assistant professor of molecular and cell biology and a member of the QB3 consortium (California Institute for Quantitative Biomedical Research).
"This technique can be used to quickly identify pathogenic genes that interact closely with the human immune system, since these genes are under tremendous pressure to evolve quickly," said coauthor Joshua B. Plotkin, a junior fellow in the Faculty of Arts and Sciences at Harvard. "Such genes are prime targets for new drugs and vaccines to counter deadly pathogens."
The technique involves a statistical analysis of an entire genome, comparing the rate of change of a specific gene to the average rate of change within the genome. An organism's genome is a sequence of DNA nucleotides - either A, G, T or C (for adenine, guanine, thymine and cytosine) - grouped into triplets, called codons. Each codon codes for a specific amino acid to be strung together to create a protein. The series thymine, cytosine and adenine - a TCA codon - always yields a serine amino acid, for example.
Because 64 DNA triplets can be made from the four available DNA nucleotides but there are only 20 different amino acids, some amino acids are coded by more than one codon. Arginine, for example, is coded by six different codons: CGA, CGC, CGG, CGT, AGA and AGG.
Based on an idea by Plotkin, the team zeroed in on the susceptibility of codons to point mutations - alteration of a single DNA nucleotide - and the fact that not all point mutations have the same effect. A random point mutation in some codons is less likely to create a codon that codes for a different amino acid. For example, the conversion of CGA to CGC would still result in an arginine, leaving the protein's amino acid sequence unchanged. Based on the structure of the genetic code - that is, the translation table connecting codons to amino acids - the group was able to tell which codons were more likely to have been mutated into a codon for a different amino acid.
By counting, for example, the frequency of the six codons coding for arginine in a single gene, and comparing it to the frequency throughout the full genome, the researchers are able to determine whether the gene has likely evolved faster or slower than the genome as a whole.
"We add up over an entire gene which triplets it's using, and then we ask, 'Would we expect to see this kind of usage of triplets just by chance or not?'" Fraser said. "If not, it's unusual and gives us a clue to how the gene has been evolving."
"We need the whole genome sequence because we have to learn, for each genome, what its background distribution of triplets is," he added. "If we didn't know that, we wouldn't be able to find a gene with a significant departure from that."
The technique only works with some amino acids. The new results come from an analysis of arginine, leucine and serine, each of which is coded for by six different codons, and glycine, which coded for by four different codons.
The team, which included Jonathan Dushoff, a postdoctoral researcher at Princeton and the NIH, used its technique to analyze the 4,000 genes in the genome of the tuberculosis bacterium (Mycobacterium tuberculosis) and the 5,000 genes in the genome of the malaria parasite (Plasmodium falciparum).
The genes in these organisms that turned out to be rapidly evolving were largely those genes coding for antigens, that is, proteins that coat the surface of the pathogen and incite an immune response. By constantly changing its antigen coat, a pathogen can elude the immune system, evolving eventually into a new strain to challenge the human immune system again.
"The fact that we found most antigens were quickly evolving under our metric confirmed that our technique works," Fraser said.
The researchers also discovered previously unrecognized genes that are evolving rapidly. These genes are attractive candidates for further research into which genes may be interacting with the human immune system.
"We also found that within classes of antigens, some are under much stronger selection than others, which people hadn't found before," he said. "We are able to make hypotheses about which ones are actually interacting with the immune system and which ones are not, based on this new finding."
Fraser emphasized that the technique, referred to as codon volatility, complements comparative gene methods common now. Codon volatility can tell about recent evolutionary pressure on genes, while comparative methods can tell about evolutionary pressure over millions of years.
The codon volatility method has limitations, however, he said. It relies on the fact that the proportion of each of the four DNA nucleotides is fairly uniform across the entire genome of an organism. In humans, however, the proportion is different at different places in the genome. Nevertheless, Fraser said the group is at work modifying the method to analyze codon volatility in the human genome.
The work was supported by the Harvard Society of Fellows and the NIH.