Scientists from the Genome Sequencing Center (GSC) at Washington University School of Medicine in St. Louis will publish the completed DNA sequences of human chromosomes 2 and 4 in the April 7 issue of Nature.
With this publication, the GSC completes its contributions to initial human genome sequencing and early inventory of potentially interesting genetic features in the 23 human chromosomes. Researchers at the GSC were primarily responsible for chromosomes 2, 4, 7 and Y, producing the initial analyses of more than 20 percent of the human genome.
Other institutions that contributed to the sequencing and analysis of chromosomes 2 and 4 include the University of Washington School of Medicine, the European Molecular Biology Laboratory, Pennsylvania State University, Stanford Human Genome Center and Lawrence Livermore National Laboratory.
Chromosome 2 and chromosome 4 are approximately 237 million base pairs and 186 million base pairs long. Scientists confirmed the existence of a total of 1,346 protein-coding genes on chromosome 2 and 796 protein-coding genes on chromosome 4.
Included on chromosomes 2 and 4 are genes previously linked to Huntington's disease, polycystic kidney disease, a form of muscular dystrophy, and Wolf-Hirschhorn syndrome, a condition that causes severe birth defects and mental retardation.
Human chromosome 2, the second largest human chromosome, originated during the evolution of Homo sapiens by the merger of two chimpanzee chromosomes recently renamed chimp chromosomes 2a and 2b. Other scientists had previously identified the area where the two chromosomes fused together. The new analysis further highlights the remnants of that merger, including a region of about 2.6 million base pairs where the sequence is similar to that found around centromeres, central chromosome structures that are important for DNA replication.
"Inside that region is a tract of about 36,000 base pairs that features a repetitive sequence typical of the centromeres themselves, and we think that may be the remnant of the centromere of one of the two chimp chromosomes that merged to form human chromosome 2," says lead author LaDeana Hillier, senior research scientist at the GSC.
The new insight into chromosome 2's ancestry has senior author Rick Wilson, Ph.D., director of the GSC, interested in identifying other genomic relics of evolutionary change.
"These data raise the possibility of a new tool for studying genome evolution," Wilson says. "We may be able to find other chromosomes that have disappeared over the course of time by searching other mammals' DNA for similar patterns of duplication."
Scientists identified some of the human genome's largest gene deserts on chromosomes 2 and 4. These are large regions of DNA that contain very little in the way of protein-building instructions.
"For example, there are two regions of chromosome 2 that are each almost 10 million base pairs long each surrounding a single gene called protocadherin," Hillier says.
Researchers have found evidence that the protein made by the protocadherin gene is active in the heart and the brain. The protein made by the gene is thought to function in cell-to-cell recognition and adhesion but hasn't been definitively characterized yet. The function of the deserts around the gene is elusive.
"The deserts contain short, specific non-coding segments segments that may well be sites of gene regulation such as transcription factor binding sites--areas on the DNA where molecules can bind to change the activity of the protocadherin gene or other genes," says Hillier. "As we compared these areas to other genomes, we were intrigued to find both these short segments and this gene desert structure has been maintained in mammals and birds."
The presence of the deserts in other genomes suggests they may have important regulatory functions that researchers have yet to identify, according to Hillier.
Also included on chromosome 2 is the longest protein coding sequence yet identified, a gene called titin that spans 280,000 base pairs and produces a muscle protein that is more than 33,000 amino acids long. Protein length varies widely, but typically averages about 500 amino acids.
Scientists identified several "hypervariable" regions, regions where the sequence of base pairs show significant variation among individuals.
"Our most highly variable region had 75 differences in a 5,000 base pair segment," Hillier notes. "Normally there will be one or two differences every thousand base pairs, and we had set three differences per five thousand base pairs as our initial threshold for searching for these regions."
Hillier and her colleagues sequenced several of these regions in a panel of 24 ethnically diverse people, and confirmed that these blocks of extreme variation occur regularly rather than randomly and appear to arise from two distinct underlying patterns. They then checked the regions in the chimp genome.
"In general, they were also highly variable in chimps," Hillier says. "It's going to take a lot more work to figure out what's going on in these regions, but some of these occur near known genes and studying them will greatly facilitate the study of human genetic variation and, at least in some cases, its correlation with disease."
Hillier notes that chimp genome sequence data was very helpful for their analysis of human chromosomes 2 and 4. Hillier, Wilson and others are leading the analysis of the chimp genome, which they expect to publish soon.
For example, a comparison of the human genome to the chimp genome and other previously produced genomes revealed a gene that only appears to be functional in the human and chimp genomes. Scientists have tentative evidence the gene may be used to make a protein in the brain and the testes. If researchers can confirm that the novel gene is used to make a protein, scientists will be eager to determine its potentially unique or important role in human and chimp physiology.
"Now that we have so many different genomes, we're really in a position to start to understand them so much better," Hillier says. "It really was the dark ages back in the 1990s when we had so little genomic sequence."
Hillier LW, et al. Generation and annotation of the DNA sequences of human chromosome 2 and 4. Nature, April 7, 2005. Funding from the National Human Genome Research Institute supported this research.
Washington University School of Medicine's full-time and volunteer faculty physicians also are the medical staff of Barnes-Jewish and St. Louis Children's hospitals. The School of Medicine is one of the leading medical research, teaching and patient care institutions in the nation, currently ranked third in the nation by U.S. News & World Report. Through its affiliations with Barnes-Jewish and St. Louis Children's hospitals, the School of Medicine is linked to BJC HealthCare.
By Michael Purdy