News Release

Finding the smallest genes could yield outsized benefits

Salk technique may reveal important health biomarkers and new disease targets

Peer-Reviewed Publication

Salk Institute

Salk Saghatelian Science Image

image: Caption: This illustration represents the Saghatelian lab's method for finding genes known as small open reading frames (smORFs). The "microproteins" encoded by smORFs have been linked to immune function, cell stress and many other cellular processes, which suggests that detecting smORFs could lead scientists to new biomarkers and drug targets for human diseases. view more 

Credit: Salk Institute

LA JOLLA--(December 9, 2019) While scientists know of about 25,000 genes that code for biologically important proteins, additional, smaller genes hiding in our DNA may be just as important. But these tiny lines of genetic code have proven tough to track down.

A new study from the Salk Institute identified over 2,000 new, small genes--expanding the number of human genes by 10 percent. These previously unknown genes are known as small open reading frames (smORFs), and the scientists have developed a method for detecting these important genetic sequences in human cell lines.

"We've expanded the human genome," says Salk Professor Alan Saghatelian, co-corresponding author of the study, published in Nature Chemical Biology on December 9, 2019. "This work can really be applied to better understand human biology and may eventually have implications for diseases ranging from cancer to diabetes."

Over the last ten years, Saghatelian and his colleagues have been developing methods to better identify smORFs that affect human health. Already, "microproteins" encoded by smORFs have been linked to immune function, cell stress and even early muscle development. Saghatelian says there is growing evidence that detecting smORFs could lead scientists to new biomarkers and drug targets for human diseases.

Thomas Martinez, first author of the study and postdoctoral fellow in the Saghatelian lab, led the effort to use a technique called Ribo-Seq to see which smORFS actually encoded proteins in cells. Ribo-Seq is routinely used for detecting the production of larger proteins but proved less consistent for detecting smORFs. The team solved this problem by optimizing the experiment to more reliably detect smORFs and yield the most robust estimate of the number smORFs in the human genome.

Martinez's work made it possible to find smORFs in three human cell lines, taken from leukemia, ovarian cancer and immortalized kidney cells. Around 7,500 smORFs showed up in at least one cell line. Of those, around 1,500 appeared in at least two cell lines--and kept showing up when the researchers repeated their experiments. The reproducibility of the results gave the researchers confidence that these newly spotted genes really existed.

"We finally have reliable information that the human genome contains at least 2,500 to 3,500 smORFs," says Saghatelian.

The challenge now is to figure out which smORFs are involved in disease--and whether the microproteins they code for could be disease targets. Already, the researchers have identified around 500 smORFs that show up in all three cell lines, suggesting they could have important biological functions.

"Right now, our methods can tell us if a smORF exists or doesn't exist, but it doesn't give us a lot of information on what is actually related to disease," says Saghatelian. "Going forward, the lab will start doing more research to find smORFs that may be specific to diseases like cancer or diabetes."

Saghatelian says the science of smORFs is still in its early days, so the researchers hope other labs around the world will use their methods to hunt for smORFs in their own cell lines.

"This is really an unexplored area," says Martinez. "At the end of the day, you want to know what all the parts are in the genome."


Other authors of the study included Qian Chu, Cynthia Donaldson, Dan Tan and Maxim N. Shokhirev of Salk.

The research was supported by the National Institutes of Health (R01 GM102491, F32 GM123685), the Leona M. and Harry B. Helmsley Charitable Trust, Dr. Frederick Paulsen Chair/Ferring Pharmaceuticals, the George E. Hewitt Foundation for Medical Research, and Pioneer Fellowship. This work was also supported by the Razavi Newman Integrative Genomics and Bioinformatics Core and the Next Generation Sequencing Core Facilities of the Salk Institute with funding from the National Institutes of Health (P30 014195) and the Chapman Foundation.

About the Salk Institute for Biological Studies:

Every cure has a starting point. The Salk Institute embodies Jonas Salk's mission to dare to make dreams into reality. Its internationally renowned and award-winning scientists explore the very foundations of life, seeking new understandings in neuroscience, genetics, immunology, plant biology and more. The Institute is an independent nonprofit organization and architectural landmark: small by choice, intimate by nature and fearless in the face of any challenge. Be it cancer or Alzheimer's, aging or diabetes, Salk is where cures begin. Learn more at:

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.