Knowing this ancestral mammal's complete genome--the sequence of As, Cs, Ts, and Gs in the DNA that made up its chromosomes--would not mean that scientists could bring it to life. (For one thing, synthesizing that much DNA would be prohibitively expensive and technically difficult.)
But that's not the point. The point is to understand the evolution of humans and other mammals at the molecular level, said David Haussler, professor of biomolecular engineering at the University of California, Santa Cruz.
"We will be able to trace the molecular evolution of our genome over the past 75 million years. It's a very exciting new way to think about our origins, a kind of DNA-based archaeology to understand how we came to be," said Haussler, a Howard Hughes Medical Institute (HHMI) investigator and director of UCSC's Center for Biomolecular Science and Engineering.
Haussler and Mathieu Blanchette, a postdoctoral researcher at UCSC who is now at McGill University, teamed up with Eric Green, scientific director at the National Human Genome Research Institute (NHGRI) and director of the NIH Intramural Sequencing Center, and Webb Miller, professor of biology and computer science and engineering at Pennsylvania State University, to look at the possibility of reconstructing the ancestral mammalian genome. A paper describing their findings appears in the December issue of the journal Genome Research.
The study is an extension of ongoing research in comparative genomics--the effort to understand the human genome by comparing it with the genomes of other species. By comparing the human genome to the ancestral genome, scientists may learn much more than they can from comparisons with other living species, such as the mouse, rat, and chimpanzee, Haussler said.
"If we find a DNA sequence in the human genome that is missing in the corresponding place in the mouse genome, we can't tell whether that DNA was inserted in the evolution of humans from the mammalian ancestor or deleted in the evolution of mice," he said. "If the ancestral genome is available, this ambiguity disappears."
The researchers developed a computational procedure for reconstructing ancestral genome sequences that was based primarily on Miller's widely used genome comparison software, together with an assortment of new computer algorithms devised for this project. To test the reconstruction process, they created an artificial set of mammalian genome sequences for which the ancestral sequence was known.
Blanchette, who is the first author of the paper, generated this artificial evolutionary tree by creating a massive software program to simulate all the known processes that modify DNA as it evolves. The program was based on a huge amount of data from Green's research at NHGRI, where scientists are doing comparative analyses of genomic sequences from many different vertebrate species. In particular, the researchers focused on a region called the CFTR locus, which includes the gene involved in cystic fibrosis. This region--encompassing ten genes and adjacent stretches of DNA, for a total of more than one million base pairs or "letters" of genetic code--had been completely sequenced in many different mammals.
"This region served as an example of the evolutionary changes that happened in all these different mammalian lineages. [Blanchette] took all of the information we learned from detailed examination of this one region and incorporated it into a software program that is able to simulate mammalian evolution at the molecular level," Haussler said.
The researchers applied the software to a hypothetical ancestral DNA sequence, artificially evolving the DNA along all the various pathways of mammalian evolution to create simulated modern DNA sequences for 20 different species. Then they used their reconstruction procedure (a completely separate process that incorporates no information from the simulation process) to recreate the ancestral sequence. Comparing the reconstructed sequence with the original ancestral sequence, they found that it was 98 percent accurate.
The next step was to apply the reconstruction process to the actual genomic sequences for the CFTR locus in the 19 mammalian species for which this DNA sequence was available, including humans. The researchers evaluated the accuracy of this reconstruction by comparing it to species not included in the group from which it was derived, such as chickens and opossums. These comparisons suggested an accuracy of more than 99 percent in the places they could test, including the cystic fibrosis gene itself.
"Overall, we think that we can reconstruct the DNA sequences of the ancestral genome with an accuracy of more than 98 percent," Haussler said. "That is much higher accuracy than people expected would be possible."
Haussler said he hopes these findings will encourage the additional genome sequencing that would be needed to do a complete reconstruction of the ancestral mammalian genome. Living mammals, from apes to bats to whales, are all variations on a common mammalian theme. Comparisons with their common ancestor can reveal new insights into not only the core biology that all mammals share in common, but also the unique traits that define each species, he said.
Nearly complete genome sequences are available now for five mammals, and about 20 would be needed for an accurate reconstruction. NHGRI and other organizations are planning to fund sequencing projects for a number of additional mammals, but it remains to be seen whether and how soon enough genomic data will be available, Haussler said.
"It's a great challenge to reconstruct the history of the entire human genome, but it's the key to understanding what makes us special," he said.