Completing the second phase of the 1000 Genomes Project, a multinational team of scientists reports that they have sampled a total of 1092 individuals from 14 different populations and sequenced their full genomes. The researchers described the feat as a collegial effort to equip biologists and physicians with information that can be used to understand the normal range of human genetic variants so that a patient's disease genome can be interpreted in a broader context.
A report on the research, published online in Nature on Nov. 1 represents the culmination of five years of work, says Aravinda Chakravarti, Ph.D., professor of medicine and pediatrics and a member of the Institute of Genetic Medicine at the Johns Hopkins School of Medicine. Chakravarti helped to design the population genetics sampling plan.
"The DNA donors in the study were not known to have any diseases, so the study gives us the genomic background we need for understanding which genetic variations are 'within the normal range,'" Chakravarti says. "With this tool, scientists now have a standard with which they can compare the genome of someone with diabetes, for example." That in turn, Chakravarti says, will increase opportunities for understanding the disease and creating targeted, individualized treatment.
The selection of the 14 populations sampled was based on their ancient migratory history and their genetic relationship to the other populations studied. Within each population, healthy, unrelated donors were randomly chosen for blood draws. The blood samples were first transformed into cell lines that can be stored and grown indefinitely so that they will always be available for future studies. After cell lines were grown, the DNA was sequenced and added to a public database.
The first human genome to be sequenced, published in 2003, made clear that as much as 98.5 percent of human genetic material does not encode proteins, as had been thought. Scientists now know the role of some of the non-protein-coding regions and, although much of the genome remains a mystery, there is reason to suspect that at least some of it plays a part in the variability seen in disease susceptibility and prevalence.
"The 1000 Genomes Project started at the beginning, with the whole genome and with no bias in the search for disease-related variants toward protein-coding genes," Chakravarti explains. "Regulatory sequences and sequences we still don't understand were also catalogued, so this information widens the areas of the genome we can search when looking for disease-causing variants." Most of the genetics research done to date has begun with a disease or a protein that is known to be malfunctioning, followed by a hunt for the responsible genetic variants.
The genetic variations found in the populations analyzed were categorized by how frequently they appeared in the individuals tested. Variants seen in more than five percent of the samples were classified as common variants, while low-frequency variants appeared in 0.5 to five percent of individuals and rare variants in less than 0.5 percent of the samples.
The 14 populations sampled were divided into four ancestry groups: European, African, East Asian and American. As expected, most of the common variants had already been identified in previous studies, and their frequencies varied little between ancestry groups.
By contrast, 58 percent of the low-frequency variants and 87 percent of the rare variants were described for the first time in this study. Rare variants were sometimes twice as likely to be found within a particular population as in that population's broader ancestry group. Different populations also showed different numbers of rare variants, with the Spanish, Finnish and African-American populations carrying the greatest number of them.
Amazingly, Chakravarti says, the researchers found that among rare variants, the healthy people in their study possessed as many as 130 to 400 protein-altering variants; 10 to 20 variants that destroy the function of the proteins they encode; two to five variants that damage protein function; and one or two variants associated with cancer. The implication is that all healthy people everywhere carry similar numbers of rare, deleterious variants.
Several factors allow people to survive with so many errors in our genome, Chakravarti explains. One factor is that genes occur in pairs, yet our bodies often only require one normal copy to work. Another is that a "redundant" gene elsewhere in the genome can sometimes compensate for a specific deficiency. In addition, some deleterious genes are only turned on in response to certain environmental cues that a particular individual may never encounter.
The first phase of the 1000 Genomes Project, led by Chakravarti, Peter Donnelly at Oxford and David Altshuler at the Broad Institute of MIT and Harvard, was completed in 2008. It was a preliminary probe into the genomes of a subset of the individuals sequenced for this second phase and proved to be illuminating in searching for genetic markers of disease. The final phase of the project will involve sequencing the genomes of 1500 more individuals from 11 more populations.
More than 100 authors from 111 institutions worldwide contributed to this study. This work was supported by grants from the National Institutes of Health and many other international funding agencies.
The following communities generously donated samples: Yoruba in Ibadan, Nigeria; Han Chinese in Beijing, China; Japanese, in Tokyo, Japan; a Mormon community in Utah, United States; Luhya, in Webuye, Kenya; people with African ancestry in the Southwestern United States; Tuscany in Italy; people with Mexican ancestry in Los Angeles, Ca., United States; Southern Han Chinese in China; British in England and Scotland; Finnish in Finland; Iberian populations in Spain; Colombians in Medellin, Colombia; and Puerto Ricans in Puerto Rico.
Disclosure: Chakravarti is on the scientific advisory board for Affymetrix and Biogen Idec.
On the Web:
Chakravarti lab: https:/
Project website: http://www.