News Release

Team plucks needle from genomic haystack, finding essential transcription factor binding sites

Peer-Reviewed Publication

Children's National Hospital

Using CRISPR/Cas9 knockout screens a multi-institutional research team systematically interrogated the essentiality of more than 10,000 forkhead box protein A1 (FOXA1) and CTCF binding sites in breast and prostate cancer cells, plucking useful needles from a massive genomic haystack that contains millions of transcription factor binding sites. They found that essential FOXA1 binding sites act as enhancers to orchestrate the expression of nearby essential genes, the team reports Nov. 11, 2019, in PNAS.

"Ninety-nine percent of the human genome is non-coding DNA, which previously had been thought of as junk," says Wei Li, Ph.D., a principal investigator in the Center for Genetic Medicine Research at Children's National Hospital and co-lead study author. "We now know that the non-coding regions of the genome can play important roles in a lot of biological functions, including cancer cell growth. The problem is there was no good way to figure out which among the millions of candidates are important in the biology of cancer."

While previous techniques interrogated a few hundred non-coding genomic regions, Li says their team was able to test more than 10,000 sites in a single experiment.

Overall, the team found 37 FOXA1 binding sites in T47D cells are essential, including 29 strong FOXA1 binding sites and eight binding sites near essential genes. That includes estrogen receptor 1, "the master transcription factor for ER+ breast cancer cells," and TRPS1, another transcription factor associated with ER+ breast cancer progression, the research team writes.

Li says the most exciting part of the work is the machine learning model they developed to predict which potential transcription binding sites are most important, yielding clinically relevant information that in the future may help patients.

"We have only finished the first step. We need to improve our machine-learning model. We need to conduct many more experiments. We need to test on cell lines using experimental models. And, we eventually hope to launch clinical trials to validate our findings in humans," he says. "It will be years from now, but we hope our machine learning model can one day be used to tell a patient which of the variants located in their genome may affect their risk of getting cancer."


In addition to Li, study co-authors include co-lead author Teng Fei, Northeastern University in China; Jingyu Peng, Tengfei Xiao, Chen-Hao Chen, Alexander Wu, Jialiang Huang, X. Shirley Liu and senior author Myles Brown, all of Dana-Farber Cancer Institute; and Chongzhi Zang, University of Virginia.

Financial support for the research described in this post was provided by the National Human Genome Research Institute, under award number R01HG008728, National Natural Science Foundation of China under award number 31871344, Fundamental Research Funds for the Central Universities under award numbers N172008008 and N182005005), 111 Project under award number B16009, Program for Innovative Talents of Higher Education Institutions in Liaoning Province under award number LR2017018, the Center for Genetic Medicine Research and Gilbert Family Neurofibromatosis Institute at Children's National Hospital and the Pharmaceutical Research and Manufacturers of America Foundation.

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.