New Haven, Conn. -- A study by Yale researchers offers a new view of what causes the greatest genetic variability among individuals -- suggesting that it is due less to single point mutations than to the presence of structural changes that cause extended segments of the human genome to be missing, rearranged, or present in extra copies.
"The focus for identifying genetic differences has traditionally been on point mutations or SNPs -- changes in single bases in individual genes," said Michael Snyder, the Cullman Professor of Molecular, Cellular & Developmental Biology and senior author of the study, which was published in Science Express. "Our study shows that a considerably greater amount of variation between individuals is due to rearrangement of big chunks of DNA."
Although the original human genome sequencing effort was comprehensive, it left regions that were poorly analyzed. Recently, investigators found that even in healthy individuals, many regions in the genome show structural variation. This study was designed to fill in the gaps in the genome sequence and to create a technology to rapidly identify structural variations between genomes at very high resolution over extended regions.
"We were surprised to find that structural variation is much more prevalent than we thought and that most of the variants have an ancient origin. Many of the alterations we found occurred before early human populations migrated out of Africa," said first author Jan Korbel, a postdoctoral fellow in the Department of Molecular Biophysics & Biochemistry at Yale.
To look at structural variants that were shared or different, DNA from two females -- one of African descent and one of European descent -- was analyzed using a novel DNA-based methodology called Paired-End Mapping (PEM). Researchers broke up the genome DNA into manageable-sized pieces about 3000 bases long; tagged and rescued the paired ends of the fragments; and then analyzed their sequence with a high-throughput, rapid-sequencing method developed by 454 Life Sciences.
"454 Sequencing can generate hundreds of thousands of long read pairs that are unique within the human genome to quickly and accurately determine genomic variations," explained Michael Egholm, a co-author of the study and vice president of research and development at 454 Life Sciences.
"Previous work, based on point mutations estimated that there is a 0.1 percent difference between individuals, while this work points to a level of variation between two- and five-times higher," said Snyder.
"We also found 'hot spots' -- particular regions where there is a lot of variation," said Korbel. "While these regions may be still actively undergoing evolution, they are often regions associated with genetic disorder and disease."
"These results will have an impact on how people study genetic effects in disease," said Alex Eckehart Urban, a graduate student in Snyder's group, and one of the principal authors on the study. "It was previously assumed that 'landmarks,' like the SNPs mentioned earlier, were fairly evenly spread out in the genomes of different people. Now, when we are hunting for a disease gene, we have to take into account that structural variations can distort the map and differ between individual patients."
"While it may sound like a contradiction," says Snyder, "this study supports results we have previously reported about gene regulation as the primary cause of variation. Structural variation of large of spans of the genome will likely alter the regulation of individual genes within those sequences."
According to the authors, even in healthy people, there are variants in which part of a gene is deleted or sequences from two genes are fused together without destroying the cellular activity with which they are associated. They say these findings show that the "parts list" of the human genome may be more variable, and possibly more flexible, than previously thought.
Other authors from Yale in addition to primary authors Alex E Urban and Jan Korbel, who is also affiliated with the European Molecular Biology Laboratory in Heidelberg, Germany, are Fabian Grubert, Philip Kim, Dean Palejev, Nicholas Carriero, Andrea Tanzer, Eugenia Saunders, Sherman Weissman, and Mark Gerstein. The research was funded the National Institutes of Health, a Marie Curie Fellowship, the Alexander von Humboldt Foundation, The Wellcome Trust, Roche Applied Science and the Yale High Performance Computation Center.
Citation: Science: Science Express (on line) September 28, 2007.
Listen to audio comment by Jan Korbel:
Listen to audio comment by Michael Snyder: