COLUMBUS, Ohio - A team of Ohio State University genetics researchers have produced a third map of the human genome, this one containing twice the number of genes proposed by two earlier maps and providing annotations that explain the function of all 66,000 genes.
In February, teams of researchers from Celera Genomics, a private biotechnology firm, and counterparts from the Human Genome Project, the federally funded effort to map the genome, published their findings in the prestigious journals Science and Nature respectively. The Ohio State report was published today on the website of the journal Genome Biology.
Both earlier reports proposed that the human genome consists of some 35,000 genes, far less than the estimate of 100,000 to 120,000 genes which researchers had long predicted.
The Ohio State effort, which involved a team of 13 researchers from the university and a bioinformatics company, adds a third major map of the human genome and may accelerate the use of the genome in the diagnosis and understanding of diseases.
"We ended up with a higher estimated number of genes than the other two teams because we compared 13 different gene databases to the DNA sequences in the draft genome produced by the Human Genome Project," said Bo Yuan, head of Ohio State's Division of Human Cancer Genetics bioinformatics group. Yuan led the project.
To help understand the process followed by Yuan and the other two teams, think of the genome as a copy of James Joyce's lengthy novel Ulysses. Each chromosome would be a chapter, each gene a sentence.
The draft version of the genome's DNA sequences that was assembled by scientists at the Human Genome Project would then resemble a copy of Ulysses that lacked all punctuation and spacing. Each of this book's chapters would consist of one long string of letters.
To identify the sentences in that long continuous string, scientists would turn to databases-assembled by other researchers-of complete or partial sentences. The scientists would then use computers to match the fragments from the databases to the string of letters in each and every chapter of the novel.
The genome map in Science and, particularly, the map in Nature relied mainly on only two databases to identify genes on their respective genome maps. The Ohio State researchers used these databases plus 11 others.
For example, the Ohio State researchers used a rodent gene database, which provided evidence for 1,437 possible genes in the human genome.
"We used more experimental evidence in assembling our map, and that suggests that there are probably between 65,000 and 75,000 transcriptional units," said Yuan.
The "transcriptional unit" Yuan refers to is a length of DNA that shows strong evidence of being a gene but which still requires future verification.
"Some researchers are unsettled by the certainty with which the Human Genome Consortium is presenting its lower gene count," said Fred Wright, assistant professor of human cancer genetics and lead author of the paper.
"In my view, the final number of genes-when it is known-will lie somewhere between their high of 40,000 and our value of 70,000."
The Ohio State map would have taken far longer to assemble without the help supercomputers at the Ohio Supercomputing Center. The work required four full weeks of supercomputing time. "Without that capability, the task would probably have taken at least a year," said Yuan.
"The computations involved millions of DNA sequences and were extremely time-consuming," said Wright. "One of the databases had over 2 million sequences, each of which had to be searched against the entire 2.8 billion base pairs in the genome draft. "Figuring out where those 2 million sequences belonged was, by itself, a major computational task."
The Ohio State map also contains revealing information about tissue-specific genes, genes that are active in some tissues but not in others.
"This has important implications for biology and for disease mapping," said Wright. "Genes that are expressed everywhere in the body are probably more fundamentally important, so if they were defective, the person would probably be dead."
A defect in a gene that is tissue-specific, on the other hand, might leave the person otherwise healthy but with a disease only in that particular tissue.
For example, the OSU researchers found that five of ten genes that are specific to the retina in the eye have been identified as involved in eye function. Furthermore, scientists have linked defects in four of these genes to certain eye diseases, he said, "and perhaps the fifth one as well."
While the remaining five genes are known to be active in the retina, their exact function remains unknown. "But they are probably important as well in how the eye functions, and when damaged, they may lead to eye disease," said Wright. By knowing where these lesser-understood genes are located in the genome, researchers can investigate them further.
Contact: Bo Yuan, 614-292-0656; Yuan.firstname.lastname@example.org. Written by Darrell E. Ward, 614-292-8456
Editor's note: The citation is Genome Biology 2001 2(7): research0025.1-0025.18.