DNA sequencing has revealed a vast amount of information about biology. But genome sequencing remains expensive and time consuming, so scientists need a strategy to help them select the organisms that will give them the most new information.
One solution is to sequence the most distantly related organisms, to get the widest possible diversity of sequences. Biologists represent the relationships between different species as a tree, with the length of the branches varying according to the degree by which their DNA sequences differ. "If we are prepared to assume that the most informative set is the one with the greatest evolutionary divergence, the problem of which species to sequence next can be solved by observing the length of the branches that separate the unsequenced species from those that have already had their genomes sequenced, and choosing the organism that's separated from the others by the longest sequence of branches", explains Fabio Pardi.
The tendency has been for centres to choose a group of new genomes to sequence. However, the current study shows that picking the best candidates one at a time is equally informative. "Computer scientists call this a 'greedy strategy' because it involves always taking the best bet for yourself", says Nick Goldman. "However, if, say, a centre had enough funding to sequence five organisms, we might expect to get a better set of genomes by considering all five together. Counter-intuitively, we found that in this case the greedy strategy is the best. We were surprised because in computer science greed is definitely not good - greedy algorithms seldom provide the best solution to a problem."
"Our findings have clear implications for planning large-scale genome sequencing efforts", continues Pardi. "Provided that they remain open about their choices so that two different sequencing centres don't choose the same genome, selecting the next most attractive organism to sequence is just as effective as having a long-term strategy."
Evolutionary divergence isn't the only factor that scientists consider when choosing which genomes to sequence, but other criteria can be factored into Goldman and Pardi's greedy strategy so long as those criteria can be quantified. For example, sequencing costs, or the economic importance of an organism, could be considered. Their strategy can also be applied to different problems, such as conservation biology. 'Of course, we're not advocating that genome scientists or conservation biologists stop working cooperatively, but at least they can feel confident about sequencing or conserving the organism of their choice without messing things up for their collaborators,' says Goldman.