More than 120,000 novel human genetic variations that affect large regions of DNA have been discovered, some of which are linked to immune response, disease susceptibility or digestion. Scientists at the Wellcome Sanger Institute identified these changes affecting multiple bases of DNA, known as structural variations, in a study of the most diverse worldwide populations examined to date. This included variations in medically-important genes in populations from Papua New Guinea that were inherited from Denisovan ancestors.
The resource, published today in Cell (11 June), adds new regions of sequence to the human reference genome, the world standard for all of human genetics, which is nevertheless incomplete. These previously-unknown variations in medically-important genes, which could affect the efficacy of medical treatments in certain populations, will be a valuable resource for the field of precision medicine around the globe.
Structural variations are genetic changes that can encompass anything from a few to millions of base pairs of DNA and are therefore particularly likely to affect how genes function. Some genes, such as those that influence immune response, are considered to be 'medically important'. DNA changes affecting how these genes function can lead to health problems or increased resistance or susceptibility to particular diseases.
Up until now, most large-scale genetic studies have generally focused on changes affecting single base pairs of DNA.
Scientists at the Wellcome Sanger Institute had previously led the sequencing of 911 genomes from 54 geographically, linguistically and culturally diverse populations from across the globe, and have now searched for structural variations* in these sequences.
The sequences were compared to the human reference genome to create a catalogue of structural variations, over three quarters of which were previously unknown. The team then investigated how common these structural variations are in each of the 54 populations, and which of them were inherited from Neanderthal or Denisovan ancestors.
Among the 126,018 structural variations discovered were medically-important variations inherited from Denisovan ancestors in Oceanian populations from Papua New Guinea and nearby, including a high-frequency deletion in the AQR gene that plays a role in detection of viruses and regulation of antiviral immune response.
Mohamed Almarri, first author of the study and PhD student at the Wellcome Sanger Institute, said: "By analysing the genomes of understudied populations we've been able to find high-frequency structural variations not uncovered by previous large-scale sequencing projects. Several of these are in medically-important genes that tell us how a population has evolved to resist a certain disease or why they might be susceptible to others. This is vital knowledge and will help to ensure that treatments can be tailored to each specific population."
Other notable structural variations were uncovered by the study that, together with existing knowledge of human evolution and the role of specific genes, shine a light on how individual populations have evolved.
The Karitiana people, who reside in modern-day Brazil, were found to carry a variation in the MGAM gene that affects starch digestion. The Karitiana diet is derived from fishing, hunting and farming, so a decrease in starch digestion is probably disadvantageous and therefore surprising. It is thought that bad luck may have concentrated this variation in the small population that survived a population crash within the last 5,000 years.
The team also discovered novel 'runaway duplications', where populations have evolved to carry multiple copies of genes. For example, all of the African populations included in the study carried multiple copies of the HPR gene, which is associated with resistance to sleeping sickness**. The highest numbers of copies (up to nine) were carried by Central and West African populations, where the disease is most prevalent.
Dr Ed Hollox, an expert in the field from the University of Leicester, said: "This is a very valuable study showing the importance of structural variation of the human genome in the genetic diversity of humans around the world. The work supports the concept that some human adaptations to different environments are due to the loss or gain of whole genes, or parts of genes. Structural variation can be challenging to find, and this study also provides a well-founded structural variation reference set which will serve as an important springboard for future studies."
The study adds almost two million newly-identified base pairs to the human reference genome sequence. Because the human reference genome was assembled from a small number of people, regions of DNA that were not present in these individuals are missing from the reference sequence.
The team recreated 25 diverse human genomes from scratch using a recent technological innovation called de novo genome assembly. By directly comparing these assembled genomes to the reference, the researchers were able to identify missing sequences present in multiple populations. This illustrates the limitation of a single human reference and the need for high-quality reference genomes from diverse populations.
Dr Yali Xue, recently retired from the Wellcome Sanger Institute, said: "Structural variants are complicated yet very important functionally, evolutionarily and medically. The discovery of these new structural variations provides one of the richest resources of this kind of variation so far, which not only offers unique insights into population histories and improves the currently used human reference genome, but will also substantially benefit future medical studies."
Notes to Editors:
*The genomes were sequenced from The Human Genome Diversity Project (HGDP)-CEPH panel, a collection of cell lines from diverse human populations for use in human genetic history and medical research. The cell lines are available to all researchers and are held at the Centre d'Etude du Polymorphisme Humain (CEPH) in Paris. http://www.cephb.fr/en/hgdp_panel.php#presentation
**Trypanosome infection, commonly known as sleeping sickness, is a disease affecting sub-Saharan Africa. Without treatment the disease is usually fatal. More information is available from the WHO: https://www.who.int/news-room/fact-sheets/detail/trypanosomiasis-human-african-(sleeping-sickness)
Mohamed A. Almarri, Anders Bergström and Javier Prado-Martinez et al. (2020). Population Structure, Stratification and Introgression of Human Structural Variation. Cell. DOI: https://doi.org/10.1016/j.cell.2020.05.024
This research was funded by Wellcome.
The Wellcome Sanger Institute
The Wellcome Sanger Institute is a world leading genomics research centre. We undertake large-scale research that forms the foundations of knowledge in biology and medicine. We are open and collaborative; our data, results, tools and technologies are shared across the globe to advance science. Our ambition is vast - we take on projects that are not possible anywhere else. We use the power of genome sequencing to understand and harness the information in DNA. Funded by Wellcome, we have the freedom and support to push the boundaries of genomics. Our findings are used to improve health and to understand life on Earth. Find out more at http://www.sanger.ac.uk or follow us on Twitter, Facebook, LinkedIn and on our Blog.
Wellcome exists to improve health by helping great ideas to thrive. We support researchers, we take on big health challenges, we campaign for better science, and we help everyone get involved with science and health research. We are a politically and financially independent foundation. https://wellcome.ac.uk/