Whole genome duplication followed by massive gene loss has shaped many genomes, including the human genome. Why some gene duplicates are retained while most perish has puzzled scientists for decades.
A study, published today in Science, has found that gene retention depends on the degree of "functional and structural entanglement", which measures interdependency between gene structure and function. In other words, while most duplicates either become obsolete or they evolve new roles, some are retained forever because, evolutionarily speaking, they're simply stuck.
"When we scan genomes there are some gene pairs that remain from whole genome duplication events that occurred millions of years ago," says Elena Kuzmin, a co-lead author of the study and former graduate student who trained with Charles Boone, professor of molecular genetics in the Donnelly Centre for Cellular and Biomolecular Research, at the University of Toronto, who co-led the study.
"Why are some duplicates retained while most are eliminated? We tried to find some of the reasons for this retention to help us understand evolutionary forces that shape genomes," says Kuzmin, who is now postdoctoral fellow at the Goodman Cancer Research Centre at McGill University.
The study was also led by Brenda Andrews, University Professor and Director of the Donnelly Centre, and Chad Myers, a professor of computer science at the University of Minnesota-Twin Cities.
Whole genome duplication is seen as a major source of raw genetic material for evolution to act on. Duplicated gene copies, also known as paralogs, can be found across eukaryotes, organisms that include single-celled yeasts and all multicellular forms of life.
Over extensive evolutionary time, through random mutation, the DNA code of one gene copy diverges from another until they are no longer recognized as duplicates. They either evolve new roles or decay into the non-coding part of the genome.
But some duplicates are retained, suggesting there may be evolutionary advantage to the organism in keeping both. There is little agreement among scientists about why this might be the case, however. Many evolutionary biologists think that all paralog pairs will eventually revert to single copy genes.
Genome evolution is not easy to study. "None of us were there to see what really happened with these genes," says Boone. But he believes clues can be gained from studying functional relationships between paralogs and other genes in the genome.
The researchers turned to Saccharomyces cerevisiae, or baker's yeast, whose relatively small genome makes such studies feasible. Most of its 6,000 genes exist as single copies, but 551 paralog pairs have remained from a duplication event some 100 million years ago.
Because paralogs started as identical gene copies, their function should to some extent be overlapping, or redundant. This indeed is the case for some yeast paralogs. In their 2016 Science paper, the team showed that 331 paralog pairs are redundant so that deleting either gene had no effect on the cells, whereas deleting both reduced their survival. This suggested that some paralogs are retained as essential backup in case either gene copy is lost.
But 240 paralog pairs are non-essential, that is, both can be deleted with no effect on cell survival. To unpick their functional requirement, the researchers looked for a context in which removing both genes is detrimental to cell fitness. They found it in triple mutant yeast lacking three genes--a paralog pair plus another gene.
Fitness analysis of 550,000 double and 260,000 triple mutant strains revealed that the non-essential paralogs fall into two basic classes--those with overlapping roles, and those that diverged and acquired new functions.
A closer look at the genes' DNA code revealed that the ability of non-essential paralogs to evolve new roles is determined by the molecular structure of the proteins they encode. The authors coined the term "functional and structural entanglement" to describe how much a gene's cellular function is constrained by the intrinsic physical forces acting on its protein product.
Membrane proteins, such as ion channels and receptors, are an example of this. These proteins typically contain multiple hydrophobic, or water-repelling, segments, which allow them to mix with hydrophobic lipid molecules that make up the cell membrane. Mutations to the underlying genes are likely to alter the proteins' hydrophobic nature, impairing their ability to be inserted into the membrane and rendering them nonfunctional as a result.
"Somehow these paralogs retain this functional overlap because they can't get rid of their basic structure or they lose everything," says Boone.
"Our study is an extensive experimental analysis of the functional redundancy associated with retained paralogs. We can see different features associated with the genes in the two groups that suggest that functional and structural entanglement model is valid," he says.
Computational modelling performed by the study's other co-lead author, Benjamin VanderSluis, a former postdoctoral fellow with Myers, aligned with experimental findings.
The greater the structural entanglement, the modelling showed, the greater the chance that a random mutation will harm protein function. Consequently, one paralog will be under strong evolutionary pressure to remain intact, while the other will become nonfunctional over time.
At the other end of the spectrum, paralogs that are least structurally entangled have more freedom to evolve new roles and divide up the ancestral functions between them. At the middle level of entanglement, paralogs retain overlapping functions and coexist in a steady state.
The functional and structural entanglement model predicts that some paralog pairs will be maintained indefinitely. It challenges the widely accepted view that all paralog pairs will eventually revert to a single gene state.
"We show that functional redundancy between paralogs is evolutionarily stable and can exist at steady state", says Kuzmin.