The flightless kakapo of New Zealand is in trouble. The world's heaviest parrot--representing one of the most ancestral branches of the parrot family tree--is nearly extinct, with barely 200 adults plodding the underbrush of four small islands. Whether the last of the kakapos had the genetic resilience to survive was a question that only high-quality genomic analysis could answer.
But a high-quality genome assembly did not exist for the kakapo--nor for most of the 70,000 vertebrate species alive today.
Questions about how best to prevent the extinction of species ranging from the flightless kakapos to the adorable small vaquita dolphins were left unanswered. Had inbreeding left these populations genetically non-viable? Were humans the only reason that these animals stood on the brink of extinction, or was something inherently broken in their DNA?
The solution was the Vertebrate Genomes Project, an ambitious initiative launched to generate high-quality reference genomes for every extant vertebrate species. Now in their flagship study published in Nature, they present methods and principles for sequencing and assembling high-quality reference genome.
The team has applied this approach and principles to produce 16 high-quality reference genomes, one of which was the endangered kakapo, to help reveal if it is hardy enough to rebuild its population. The researchers found that extremely small populations of the endangered kakapo and vaquita have been able to survive their low numbers in the past since the last ice age over 10,000 years ago, by purging deleterious mutations that cause disease from inbreeding. As long as humans do not kill of more of the last remaining animals, findings from the high-quality reference genomes give hope that these species could survive from less than 100 individuals each.
"We call it the 'kitchen sink approach'--combining tools from several biotech companies to make this one high-quality genome assembly pipeline," says Rockefeller's Erich D. Jarvis, Chair of the Vertebrate Genomes Project. "Endangered species were the first to benefit from the new technology because, even though conservation is not my area of research, I felt it was a moral duty."
Working with low-quality genomes
High-quality reference genomes only exist for the celebrities of laboratory science--mice, fruit flies, zebrafish, and, of course, humans. For less popular species, there is often no reference genome or, perhaps worse, messy genomes stitched together from sequences obtained via quick and dirty methods. Compared to the new VGP genomes, up to 60 percent of the genes in such genomes have missing sequence, are entirely missing, or incorrectly assembled, the researchers found. It can take years to untangle the thousands of assembly errors per species.
Many false gene duplications were found, most caused by algorithms that due not properly separate out maternal and paternal chromosome sequences and instead interpret them as a two separate sister genes. "We have thousands of genes in the literature that are false duplications. The genes are not actually there!" Jarvis says. "It is unconscionable to be working with some of these genomes."
The Vertebrate Genomes Project was born from the frustrations of hundreds of scientists working in its parent organization, the Genome 10K consortium, whose mission it was to generate genome assemblies of 10,000 vertebrate species. The initial genome assemblies that the G10K and other groups generated were based on short 35 to 200 base pair reads, but these assemblies were highly incomplete. The VGP goal is to build a library of error-free reference genomes for all vertebrate species, which researchers and conservationists will be able to use readily, without dedicating months or years to fixing individual genes. "We said, let's do some hard work on the front end, so that we can get high quality data on the back end," Jarvis says.
The "kitchen sink" approach
Many companies approached the Vertebrate Genomes Project, promising a single sequencing technology that would solve every problem with messy reference genomes. The Vertebrate Genomes Project assembly team tested each method on a single hummingbird, chosen both for its relatively small genome and because of Jarvis' research interests in vocal learning among bird species ("two birds with one stone," he quips). But every technology fell short. "None had all of the necessary components to make a high-quality assembly," Jarvis says. "So we combined many tools into one pipeline."
Their approach works. Organizations including the Earth Biogenome Project, the Darwin Tree of Life Project, and the New Zealand Genome Sequencing Project are already using the most advance version of the novel pipeline. Reference genomes that once took years to generate are now rolling out in weeks and months--all without the false duplications and other errors endemic to previous assemblies. Scientists are already using the new data to study genes that render bats immune to COVID-19, and question long-standing conventions in basic science, such as whether there are meaningful differences between oxytocin and its receptors found in humans, birds, reptiles, and fish.
All told, 20 studies and 25 high-quality vertebrate genomes accompany the rollout of the novel pipeline. "The first high-quality genomes that we sequenced taught us so much about the technology and the biology that we decided to publish in these initial papers," Jarvis says. But plenty of work still lies ahead. "The next step is to sequence all 1,000 vertebrate genera, and then all 10,000 vertebrate families, and eventually every single vertebrate species"