A tree of life shows how living things have evolved since the origins of life billions of years ago, grouping related organisms on the same branch. Such trees provide an organizing framework for biology. They can be used for predicting the properties of poorly known species and are powerful tools for tasks such as drug discovery, said Michael Sanderson, professor of evolution and ecology at UC Davis and senior author on the paper.
Comparing protein and DNA sequences is a powerful tool for creating trees, because large amounts of data can be generated quite easily. But existing databases contain big gaps, because some organisms have been studied very heavily while many others are represented by single entries or not at all.
Postdoctoral researcher Amy Driskell and colleagues from Sanderson's laboratory analyzed over 300,000 protein and DNA sequences deposited by scientists in the GenBank and Swiss-Prot public databases. They found that even though there were big gaps in the data, with many groups of organisms represented by a single sequence, it was still possible to construct useful trees starting from samples of 16,000 green plants and 7,500 plants and animals.
"It's pretty surprising that you can draw conclusions from such a small amount of information, compared to how much there would be if the databases contained a better sample of biodiversity," Sanderson said.
Researchers can now take the same approach and add more information from sequence databases, increasing the resolution of phylogenetic trees, he said.
The other authors on the paper are Cécile Ané, now assistant professor of statistics at the University of Wisconsin, Madison; UC Davis postdoctoral researchers Gordon Burleigh and Michelle McMahon; and graduate student Brian O'Meara, also at UC Davis. The work is supported by grants from the National Science Foundation.