A detailed analysis of data from 185 human genomes sequenced in the course of the 1000 Genomes Project, by scientists at the European Molecular Biology Laboratory (EMBL) in Heidelberg, Germany, in collaboration with researchers at the Wellcome Trust Sanger Institute in Cambridge, UK, as well as the University of Washington and Harvard Medical School, both in the USA, has identified the genetic sequence of an unprecedented 28 000 structural variants (SVs) - large portions of the human genome which differ from one person to another. The work, published today in Nature, could help find the genetic causes of some diseases and also begins to explain why certain parts of the human genome change more than others.
The international team of scientists identified over a thousand SVs that disrupt the sequence of one or more genes. These gene-altering mutations may be linked to diseases, so knowing the exact genetic sequence of these variations will help clinical geneticists to narrow down their searches for disease-causing mutations.
"Knowing the exact genetic sequence of SVs and their context in the genome could help find the genetic causes for as-yet unexplained diseases," says Jan Korbel, who led the research at EMBL: "this may help us understand why some people remain healthy until old age whereas others develop diseases early in their lives."
This unprecedented catalogue of large-scale genetic variants also sheds light on why some parts of the genome mutate more frequently than others. The scientists found that deletions, where genetic material is lost, and insertions, where it is gained, tend to happen in different places in the genome and through different molecular processes. For instance, large-scale deletions are more likely to occur in regions where DNA often breaks and has to be put back together, as 'chunks' of genetic material can be lost in the process.
"We found 51 hotspots where certain SVs, such as large deletions, appear to occur particularly often" Korbel says: "Six of those hotspots are in regions known to be related to genetic conditions such as Miller-Dieker syndrome, a congenital brain disease that can lead to infant death."
Previous research had already linked SVs - also called copy-number variants - to many genetic conditions, such as colour-blindness, schizophrenia, and certain forms of cancer. However, because of their large size and complex DNA sequence, SVs were difficult to identify. In this study, the researchers overcame these difficulties, developing novel computational approaches that allowed them to pinpoint the exact locations of these large-scale variations in the genome, broadening the potential scope of future disease studies.
"There are many structural variants in everyone's genomes and they are increasingly being associated with various aspects of human health" says Charles Lee, a clinical cytogeneticist and associate professor at Harvard Medical School and Brigham and Women's Hospital, and joint leader of the study: "It is important to be able to identify and comprehensively characterize these genetic variants using state-of-the-art DNA sequencing technologies."
Data from this study is being made publicly available to the scientific community through the 1000 Genomes Project, an international public-private consortium to build the most detailed map of human genetic variation to date. The 1000 Genomes Project aims to sequence 2500 whole genomes by the end of 2012, resulting, by far, in the largest collection of human genomes to date.