Findings published today in Science will accelerate the search for genes involved in human disease. The report provides a first genome-wide view of how the unique composition of genetic variation within each of us leads to unique patterns of gene activity.
By defining those genetic variants with a biological effect, the results will help prioritise regions of the genome that are investigated for association with disease. This is an important step to understanding links between genes and disease for individuals, and across populations.
The Human Genome Project gave us the instruction manual for building a human. The HapMap and Copy Number Variation (CNV) Projects developed indices of where to find differences in the manuals of different people. One of the challenges for research into variation and disease is that most variants have no consequence for our wellbeing.
The new study gives a global view of the consequences of those differences for gene activity. The work shows that activity of more than 1000 genes is affected by sequence variation and is the first map of human populations that identifies the most important fraction of DNA variation, that which directly affects gene activity.
The research was led by scientists from the Wellcome Trust Sanger Institute, together with colleagues from the University of Cambridge, Hospital for Sick Children/University of Toronto and Harvard Medical School/Brigham and Women's Hospital.
Using the HapMap series of cell samples from four populations, they measured the activity of more than 14,000 genes in cells grown in culture. The cell samples provide a snapshot of genetic activity in one cell type. The activity of each gene was then correlated with genetic variation nearby, as defined by the HapMap, an index of single-base changes (single nucleotide polymorphisms, or SNPs) and the new index of copy number variants (CNVs).
"We've been able to look back into our history and find changes that are older and likely to be shared among populations," explained Dr Manolis Dermitzakis, senior author and Project Leader at the Wellcome Trust Sanger Institute. "But we also find many that are newer and less widespread.
"These are part of our recent evolution and a step along the way to understanding the origin and personal consequences of genetic change, not least for our wellbeing. This is a first generation map of biologically important DNA sequence variation"
The understanding of the genetic basis of gene activity will help medical research to provide individuals with information about their personal predisposition to disease.
The study was a massive undertaking: it included HapMap genotype data on 700,000 SNPs located close to genes, as well as 25,000 sites interrogated for potential structural variation to examine copy-number differences, looking at the activity of 14,000 genes in 210 unrelated individuals.
SNP and CNV variation correlated with altered activity in almost 900 and 240 genes, respectively. The HapMap has been invaluable in detecting variants involved in many diseases and these results suggest that the CNV index will prove similarly useful.
"The remarkable finding was that there is such little overlap in the genes found by using the two indices," commented Dr Matthew Hurles, also a leader of the project at the Wellcome Trust Sanger Institute. "Only about 10% of the activity variants associated with a CNV were also associated with a SNP.
"This suggests that we must include CNV studies in our searches for genetic variation associated with disease or we will be missing a lot of the important genetic effects."
The results show that at least 10-20% of heritable variation in gene activity is due to CNVs. The team found associations that included previously known examples, such as UGT2B17, which has been associated with prostate cancer, proving that the new approach works well.
They also showed for the first time that activity of other genes, located close to UGT2B17, was affected. Finding other effects in this way will enhance the search for critical genes within a region of genetic possibilities.
Some associations were not found in all four populations, two-thirds (CNVs or SNPs) being found in only one population. A gene implicated in Spinal Muscular Atrophy showed an association in three populations, but not in Yoruba from Ibadan, Nigeria. Understanding population differences can help us understand our history.
Variation in copy number can affect gene activity by altering the 'dose' of a gene, by disrupting the active parts of a gene that contain the code for protein, or by disrupting the regulatory regions of the genome that control gene activity - the on/off and dimmer switches in our genome.
"Although the simplest model for a CNV affecting gene activity is where the variant is a deletion of a gene or part of a gene, we found examples where activity is affected from a distance," commented Barbara Stranger, first author and post-doctoral fellow at the Wellcome Trust Sanger Institute. "This may occur when the CNV reduces the effectiveness of a region that works to switch the genes on or off."
The survey gives the first global view of the effects of SNPs and CNVs on gene activity. The methods and resources developed will help researchers better understand the link between differences - large and small - in our genome and our health.
Notes for Editors
Publication details: Stranger BE et al. (2007) Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science
Matthew Hurles and Manolis Dermitzakis are corresponding authors.
Funding was provided by the Wellcome Trust, the National Institutes of Health, Cancer Research UK, the Leukemia and Lymphoma Society and the Brigham and Women's Hospital Department of Pathology, the UK Medical Research Council, the Royal Society and Genome Canada/Ontario Genomics Institute.
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
- Department of Oncology, University of Cambridge, Cancer Research UK Cambridge Research Institute, Li Ka Shing Centre, Robinson Way, Cambridge, UK
- Istituto di Tecnologie Biomediche-Sezione di Bari, Consiglio Nazionale della Ricerche (CNR), Bari, Italy
- Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA
- The Centre for Applied Genomics and Program in Genetics and Genomic Biology, The Hospital for Sick Children, MaRS Centre, Toronto, Ontario, Canada
- Department of Molecular and Medical Genetics, University of Toronto, Toronto, Ontario, Canada
- Program in Molecular and Computational Biology, University of Southern California, Los Angeles, CA, USA.
GENEVAR - GENe Expression VARiation database: www.sanger.ac.uk/genevar
Copy Number Variation Project: http://www.
Dr Dermitzakis' research: http://www.
Dr Hurles' research: http://www.
The Wellcome Trust Sanger Institute, which receives the majority of its funding from the Wellcome Trust, was founded in 1992 as the focus for UK sequencing efforts. The Institute is responsible for the completion of the sequence of approximately one-third of the human genome as well as genomes of model organisms such as mouse and zebrafish, and more than 90 pathogen genomes. In October 2006, new funding was awarded by the Wellcome Trust to enable the Institute to build on its world-class scientific achievements and exploit the wealth of genome data now available to answer important questions about health and disease. These programmes are built around a Faculty of more than 30 senior researchers. The Wellcome Trust Sanger Institute is based in Hinxton, Cambridge, UK.
The Wellcome Trust is the largest independent charity in the UK and the second largest medical research charity in the world. It funds innovative biomedical research, in the UK and internationally, spending around £500 million each year to support the brightest scientists with the best ideas. The Wellcome Trust supports public debate about biomedical research and its impact on health and wellbeing.