One-letter switches in the DNA code occur much more frequently in human genomes than anticipated, but are often only found in one or a few individuals.
The abundance of rare variations across the human genome is consistent with the population explosion of the past few thousand years, medical geneticists and evolutionary biologists report in the May 17 advanced online edition of Science.
"This is a dramatic example of how recent human history has profoundly shaped patterns of genetic variation," said Joshua Akey, University of Washington associate professor of genome sciences and a senior author of the study. His lab studies the genetic architecture behind differences among humans (as well as among other species) and the mechanisms of evolutionary change.
Although so-called single nucleotide variants are rare, they may influence a person's resistance or susceptibility to common diseases, like heart or lung trouble or blood problems. The rarity of each specific variation means that scientists will often need to study DNA samples from very large numbers of people to draw any genetic links to these disorders.
Researchers already realize that commonly occurring gene variants have only a modest role in the complex medical conditions with the most public health repercussions.
In this week's paper, "Evolution and Functional Impact of Rare Coding Variations from Deep Sequencing of Exomes," investigators described their study of the protein-coding sections of genomes from almost 2,440 individuals. The participants were 1,351 people of European extraction and 1,088 of African ancestry.
The study is a first step toward understanding how rare genetic variants contribute to some of the leading chronic illness causes of death in the world.
It was conducted as part of the mission of the Seattle GO at the University of Washington and the Broad GO at Harvard University and MIT, both funded by the National Institute of Health's National Heart Lung and Blood Institute Exome Sequencing Project. The exome consists of the protein-coding regions of the genome.
The overall project encompasses a great many individuals who have distinct traits, such as heart attacks before old age, strokes, or a high body mass index, to discover the genes and molecular mechanisms behind these conditions.
Low cost, rapid sequencing of whole genomes is on its way to becoming clinically feasible. The information gleaned would be more useful if statistical and experimental methods could more accurately identify gene variations that regulate biological processes and produce functionally significant proteins.
Such methods would link gene variations to disease causes and provide information for preventing and treating diseases.
The other senior author of the paper from the Exome Sequencing Project is Michael J. Bamshad, University of Washington professor of pediatrics in the Division of Genetic Medicine. Researchers from eight institutions across the nation collaborated.
The group sequenced and compared 15,585 human protein-coding genes. They located more than a half-million single-letter DNA code variations in their sample populations.
The majority of these variations arose recently in human evolutionary history and so were rare, novel, and specific either to the African or the European study populations, the researchers discovered. The researchers went on to pick just those single-letter variations in the DNA that might affect the functions of proteins. Alterations in protein functions are among the key ways genetic differences spin into disease traits.
They estimated that a little more than 2 percent of the approximately 13,600 single nucleotide variations each person carried, on average, influenced the function of about 313 genes per genome. More than 95 percent of the single-letter code changes predicted to be functionally important were rare in the overall study population.
How did so many rare variations affecting protein function arise in the human genetic code? The researchers suggest that this excess of rare variations is due to a combination of demographic and evolutionary forces.
Both European and African populations grew exponentially beginning around 10,000 years ago, but in the past 5,000 years growth rates accelerated leading to the billions of people living today. The dramatic recent increase in population size has therefore profoundly influenced the spectrum of protein-coding variation present in humans.
The scientists calculated the mean average of novel, single-letter code variations in their study subjects: 549 per individual overall. People of African descent had about twice the number of new variations compared to those of European descent, or 762 versus 382.
The researchers measured the effects of natural selection on rare coding variation. To do so, they also brought in genetic details from genes highly specific to humans relative to chimps and macaques to look for what are called "selective sweeps."
A selective sweep occurs when natural selection increases the frequency of a beneficial variant in a population. The beneficial variant doesn't travel alone. Nearby genetic material is swept along with it. Included among the genes the scientists culled out as affected by positive selection were those related to the sense of smell and to the use of energy.
The researchers also learned that most of the protein-coding variations identified in their study were predicted to be harmful. Rare variation contributes not simply to each individual's uniqueness, but also to their risk for life-shortening illnesses.
What are the implications of these findings for understanding disease and advancing personalized medicine? Before answering, the researchers pointed to present limitations in robustly identifying functional important gene variation.
"Nevertheless," they said, "there was considerable rare genetic variation among individuals that is predicted to be functional, which could explain variability in disease risk and in drug response."
The researchers would like more powerful tests to detect the effects of rare genetic variations on human health. They suggest that accounting gene-by-gene might improve research methods.
They added that the population-specific nature of most of the single-letter code changes will make it challenging to replicate disease associations with a variant across the world's people.
In addition to Akey and Bamshad, other researchers on the study were Jacob A. Tennessen, Timothy D. O'Connor, Wenqing Fu, Sean McGee, Mark J. Rieder, and Deborah A. Nickerson, all of the UW Department of Genome Sciences; Abigail W. Bigham of the UW Department of Pediatrics; Eimear E. Kenny, Simon Gravel and Carols D. Bustamante of Stanford University; Ron Do Stacey Gabriel, David Altshuler, and Shamil Sunyaev of the Broad Institute of MIT and Harvard University; Xiaoming Liu and Eric Boerwinkle , of the Texas Health Sciences Center in Houston; Goo Jun, Hyun Min Kang and Goncalo Abecasis of the University of Michigan; Daniel Jordan of the Division of Genetics at Brigham & Women's Hospital in Boston; and Suzanne M. Leal of the Department of Molecular and Human Genetics at Baylor College of Medicine. The Center for Human Genetic Research at Massachusetts General Hospital and the Human Genome Sequencing Center at Baylor College of Medicine also contributed to this study.