The studies, one by Mark Daly, Eric Lander, and colleagues, and the other by John Rioux and colleagues at Whitehead Genome Center, provide the impetus for building a "haplotype" map of the genome --a map that will make it easier, faster, and perhaps cheaper to find disease-causing or disease-predisposing genes.
Haplotypes are ancestral segments of chromosomes that contain many single letter genetic variations inherited together as a set or a block, and they can be used to decipher the genetic differences that make some people more susceptible to disease than others. Identifying haplotypes became an important concept when scientists began to realize that single nucleotide polymorphisms (SNPs)--the single letter DNA differences between individuals that comprise most genetic variation and thus underlie disease susceptibility--travel together in blocks that are quite large. If this were indeed the case for the entire genome, a haplotype map would make finding disease genes a manageable task. Instead of searching through a giant haystack of millions of SNPs, scientists would be searching through bundles of 10,000 to 50,000 bases each.
The Whitehead studies provide a strong case for building a haplotype map. One study suggests that large segments of the genome may be modular, with genetic variations traveling together as large blocks that come in very few varieties.
The other study identifies a common haplotype wherein lies a gene for susceptibility to Crohn's disease, a chronic inflammatory bowel disease (IBD) that affects more than one million Americans. This study functioned both as the clue and an example for how haplotype maps can be useful in identifying genes for common disease.
The findings will be published in accompanying papers in the October issue of Nature Genetics. The Whitehead Institute's collaborators are listed in the paper and include Thomas Hudson from the Montreal Genome Center, McGill University in Quebec, Canada.
Crohn's disease is a so-called "complex" disorder, with a tendency to cluster in families, suggesting that several genes play an important role, but that environment is also a key component. Scientists had previously identified a gene on chromosome 16 as a culprit, but this gene could only account for a fraction of the IBD cases.
In this study, researchers at the Whitehead identified a neighborhood on chromosome 5 wherein lies another gene, IBD5, involved in the disease. The gene lies in a region surrounded by a cluster of interleukins--genes that are involved in immune function and regulation. "We were very excited to identify this region--it made perfect sense given the inflammatory nature of these diseases," says John Rioux, first author on the Crohn's disease study and research scientist at the Whitehead Center for Genome Research. "This region may also be important in other inflammatory diseases besides IBD, such as lupus and asthma."
Researchers believe that in Crohn's patients, faulty responses to microbes that live in the digestive system may somehow trigger the immune system to attack the lining of the digestive tract, causing it to decay and become inflamed. "Finding a gene in a region known to be important in immunity may help us understand the disease mechanism and design better therapies." says Rioux.
Rioux and his colleagues identified all the SNPs in a large region of chromosome 5 implicated by their previous research. When they looked at these SNPs in individuals affected by Crohn's disease and those who were not, they found an entire block of variation, or haplotype, that correlated with disease. One of the many SNPs, which uniquely mark that haplotype or a combination of such SNPs are candidates for causing disease. Of these unique SNPs, none caused changes in amino acid sequence in the proteins encoded by the known genes. This could mean the disease causing SNP is in a regulatory region of a known gene and controls levels of expression of the gene, or there may be an yet unidentified gene in the region that is mutated. The researchers will now turn to molecular biology to identify the culprit.
The tools and approach used to localize the IBD gene will be broadly applicable to many complex diseases such as asthma, diabetes, heart disease, and psychiatric illness. "The simple patterns in human variation we describe in this paper exist in the general population and aren't specific to any disease or any particular ethnic background," says Whitehead Fellow Mark Daly, first author on the paper on the haplotype structure.
It was while using SNPs to dissect the region of chromosome 5 with the IBD5 gene that the researchers noticed that SNPs travel together in large blocks. This suggested that researchers won't have to search through every single SNP in an area of the genome to find one responsible for disease. Instead, researchers could simply look at a handful of key SNPs and know the identity of tens or hundreds of other neighboring SNPs. "This is the first time that we see a way to study the whole genome comprehensively," says Daly.
Daly and his colleagues also found that these haplotypes (a given set of SNPs) exist regionally in only two to four distinct sequence patterns. Basically, if a researcher is looking at a particular block of the genome, there will frequently be fewer than five flavors of variation of the sequence in that region across entire populations. They also observed that the blocks are separated by regions where considerable shuffling has occurred over generations, so each individual may have a unique combination of these blocks. Such shuffling--or recombination--occurs naturally in cells when DNA sequences on maternal and paternal chromosomes are exchanged during the formation of egg or sperm.
"Understanding human variation at this level will have a big impact on medical genetics in the future. The length and complexity of these blocks is going to vary in different parts of the genome. We now need to characterize the whole genome--create haplotype maps--so this type of work can be done easily for any disease, anywhere in the genome," says Daly.
If the architecture of the blocks and the existing haplotypes are mapped, for instance, then a researcher studying a particular disease will be able to pick a few SNPs from every block in the genome and study this set in his patients. Sequencing large populations of patients for all available SNPs is still a costly and time-consuming process. This type of comprehensive analysis will help scientists more rapidly identify key SNPs that correlate to the disease of interest.