Some regions of the human genome where the DNA can fold into unusual three-dimensional structures called G-quadruplexes (G4s) show signs that they are preserved by natural selection. When G4s are located in the regulatory sequences that control how genes are expressed or in other functional, but non-protein coding, regions of the genome, they are maintained by selection, are more common, and their unusual structures are more stable, according to a new study. Conversely, the structures are less common, less stable, and evolve neutrally outside of these regions, including within the protein-coding regions of genes themselves.
Together, these lines of evidence suggest that G4 elements should be added to the list of functional elements of the genome along with genes, regulatory sequences, and non-protein coding RNAs, among others. A paper describing the study, by a team of researchers led by Penn State scientists, appears June 29, 2021 in the journal Genome Research.
"There have been only a handful of studies that provided experimental evidence for individual G4 elements playing functional roles," said Wilfried Guiblet, first author of the paper, a graduate student at Penn State at the time of research, and now a postdoctoral scholar at the National Cancer Institute. "Our study is the first to look at G4s across the genome to see if they show the characteristics of functional elements as a general rule."
As much as 1% of the genome can fold into G4s, rather than the typical double helix (in comparison, protein-coding genes occupy approximately 1.5% of the genome). G4s are one of several non-canonical shapes into which DNA can fold, collectively known as "non-B DNA." The G4 structure forms in DNA sequences rich in the nucleotide guanine, the "G" in the ACGT alphabet of the genome. G4s have been implicated in several key cellular processes and have been suggested to play a role in several human diseases, including neurological disorders and cancer.
To better understand the function of G4s at a genome-wide scale, the research team looked at their distribution across the genome, their thermostability, and whether or not they showed signs of being under the influence of natural selection, all in relation to other functional elements of the genome. They confirmed that, as a rule, G4s are more common in regions of the genome known to have important cellular functions and that the G4s in these regions are more stable than elsewhere in the genome.
"The three-dimensional structure of G4s can form transiently and how stable their structure is depends on their underlying DNA sequence and other factors," said Guilbet. "We found that, usually, G4s located within functional regions of the genome tend to be more stable. In other words, it's more likely that the DNA is folded into a G4 at any given time and thus, more likely that the G4 is there for a functional reason."
Functional regions of the genome are generally maintained by a type of natural selection called purifying selection. Mutations in these regions could disrupt their function and be harmful to the organism. The mutations therefore are usually eliminated by purifying selection, which keeps the DNA sequence relatively unchanged over time. In nonfunctional regions of the genome, a mutation may have no impact and can persist in the genome without any consequences. These regions of the genome are said to evolve neutrally. Where G4s fall in this spectrum depends on their location in the genome.
"We can look at the patterns of change in a DNA sequence among human individuals and between humans and our close primate relatives as a test of natural selection and then use selection as an indicator of function," said Yi-Fei Huang, assistant professor of biology at Penn State and a leader of the research team. "Our tests show that G4s located within functional regions of the genome appear to be under purifying selections, which is further evidence that G4s should be considered as functional elements. The only exception from this pattern were protein-coding regions of genes, where G4s are relatively uncommon, rather unstable, and do not evolve under purifying selection. G4s in protein-coding regions of genes might be nonfunctional and costly to maintain."
The research team has recently shown that G4s, along with other types of non-B DNA, have increased mutation rates. The fact that G4s located outside of protein-coding regions are maintained by purifying selection, despite their high mutagenic potential, adds further weight to the evidence for classifying G4s as functional elements.
"We think that we are seeing evidence for a paradigm shift for how scientists define function in the genome," said Kateryna Makova, Verne M. Willaman Chair of Life Sciences at Penn State and a leader of the research team. "First, geneticists focused almost exclusively on protein-coding genes, then we became aware of many functional non-coding elements, and now we have G4s and possibly other non-B DNA elements. Three-dimensional structure may be just as important for defining function as the underlying DNA sequence."
"Defining the full complement of functional genome elements is crucial for interpreting the potential disease consequences not only of inherited genetic variants but also of mutations arising within tissues over the lifetime of individuals," said Kristin Eckert, professor of pathology at the Penn State College of Medicine, co-author of the paper, and a member of the research team. "The identification of G4s as novel functional elements within the human genome is key to advancing the use of genetics in precision medicine."
In addition to Guiblet, Huang, Makova, and Eckert, the research team includes Xiaoheng Cheng (now a postdoctoral researcher at the University of Chicago) and Francesca Chiaromonte, at Penn State, and Michael DeGiorgio at Florida Atlantic University. The study was funded by the U.S. National Institutes of Health, the Clinical and Translational Sciences Institute, the Institute of Computational and Data Sciences, the Huck Institutes of the Life Sciences at Penn State, the Penn State Eberly College of Science, and the U.S. National Science Foundation, and it also was supported by the CBIOS Predoctoral Training Program awarded to Penn State by the National Institutes of Health.