Based on genome-wide experiments, the human body has 2,064 genes relevant to COVID-19. So why are researchers only studying 611 of them?
A historical bias -- which has long dictated which human genes are studied -- is now affecting how biomedical researchers study COVID-19, according to new Northwestern University research.
Although biomedical researchers know that many overlooked human genes play a role in COVID-19, they currently do not study them. Instead, researchers that study COVID-19 continue to focus on human genes that have already been heavily investigated independent of coronaviruses.
"For understandable reasons, researchers tend to build upon existing knowledge and research tools. They appear to select genes to study based on the ease of experimentation rather than their ultimate relevance to a disease," said Northwestern's Thomas Stoeger, who co-led the research. "This means that research into COVID-19 concentrates only on a small subset of the human genes involved in the response to the virus. Consequently, many aspects of the response of human cells toward COVID-19 remain not understood."
"There are many genes related to COVID-19, but we don't know what they are doing in the context of COVID-19," added Northwestern's Luís Amaral, who co-led the study with Stoeger. "We didn't study these genes before the pandemic, and COVID-19 does not seem to be an incentive to investigate them."
The research will be published on Nov. 24 in the journal eLife.
Stoeger is a data science scholar at the Northwestern Institute on Complex Systems (NICO) and the Center for Genetic Medicine. Through a "Pathway to Independence" award from the National Institute of Aging, Stoeger is starting a research laboratory dedicated to uncovering unstudied genes with important contributions to aging and age-related diseases. Amaral is the Erastus O. Haven Professor of Chemical and Biological Engineering in Northwestern's McCormick School of Engineering. Stoeger and Amaral are both members of Successful Clinical Response in Pneumonia Therapy (SCRIPT) Systems Biology Center.
Pinpointing a historical bias
This study builds on Stoeger and Amaral's 2018 research, which was the first to explain why some human genes are more popular to study than others. In that work, they found that 30% of all genes have never been studied and less than 20% of genes are the subject of more than 90% of published papers.
Despite the increasing availability of new techniques to study and characterize genes, researchers continue to study a small group of genes that scientists have studied since the 1980s. Historically, these genes have been easier to investigate experimentally. If an animal model has a similar gene to humans, for example, researchers are more likely to study that gene. The Northwestern team also discovered that postdoctoral fellows and Ph.D. students who focus on poorly characterized genes have a 50% reduced chance of becoming an independent researcher.
Although the Human Genome Project -- the identification and mapping of all human genes, completed in 2003 -- aimed to expand the scope of scientific study beyond this small subset of genes, it has yet to fulfill this aim.
"The bias to study the exact same human genes is very high," Amaral said. "The entire system is fighting the very purpose of the agencies and scientific knowledge, which is to broaden the set of things we study and understand. We need to make a concerted effort to incentivize the study of other genes important to human health."
Bias continues into COVID-era
For the new study, Stoeger and Amaral turned to LitCOVID, a collection of research publications related to COVID-19, curated by the National Library of Medicine. LitCOVID tags genes mentioned in the titles, abstracts or results sections of individual publications.
Northwestern researchers analyzed 10,395 published papers and pre-prints from the collection. Then, they integrated them into a custom database along with more than 100 different biological and bibliometric databases in an effort to survey and measure all aspects of biomedical research. Finally, they compared genes mentioned in the COVID-19 papers to COVID-19-related genes as identified by four genome-wide studies.
Stoeger and Amaral also tracked the occurrence of genes appearing in COVID-19 literature over time. Surprisingly, they observed that studies of COVID-19 genes are becoming not more but less expansive since the onset of the pandemic.
The team hopes its study inspires other researchers to be aware of past biases and to explore unstudied genes.
"Our findings have a direct implication on the long-term planning of scientific policymakers," Stoeger said. "We can point researchers toward human genes that are important for the cellular response against viruses but risk being ignored due to historically acquired biases, which are culturally reinforced."
The study, "Meta-Research: COVID-19 research risks ignoring important host genes due to preestablished research patterns," was supported by the SCRIPT Systems Biology Center (award number U19AI135964); the National Science Foundation (award number NSF 1956338); the Northwestern University Quantitative Biology Center (award number NSF/Simons DMS-1764421); the Air Force Office of Scientific Research (award number FA9550-19-1-0354); and the National Institute of Health (award number NIH 1K99AG068544).