Next-generation cancer strategies rely on next-generation gene sequencing (NGS), which paves the way for new techniques and tools to detect mutations and determine patient therapy. A team of Chinese researchers proposed a more effective strategy to filter false positive results, which improves the accuracy and efficiency of cancer diagnosis and treatment.
The research team proposed DeepFilter, a deep-learning based filter for removing false positives in somatic variants in NGS data.
Their study was published on January 06, 2023 in Tsinghua Science and Technology.
Finding somatic mutations, or alterations in normal tissue, is key to understanding lethal genetic diseases of the human genome such as cancer. Next-generation gene sequencing accelerates the search for somatic mutations by employing technologies that separate DNA/RNA into multiple pieces and identify sequences in parallel, producing thousands or millions of sequences concurrently. This technique improves accuracy while reducing the cost and time of sequencing.
Powerful “calling tools” comb through NGS data and track down tumors or other mutations by comparing sequences to a reference genome from related tissue in the same individual.
VarDict is a somatic variant calling tool used commonly in clinical research. Previous studies have shown that VarDict achieves higher accuracy rates and detects more true variants than similar calling tools. However, VarDict also generates a higher number of false positives than other callers, which can skew results.
“An error rate of 1:10,000 in a genome with 3 billion positions would result in many false calls, which may lead to inaccurate clinical diagnoses,” said Zekun Yin, a study author from Shandong University. “However, filtering true positives may also lead to missed diagnoses.”
Typically, researchers filter out some of the false positives manually – an onerous, costly process that the Chinese research team set out to alleviate.
“It will save a lot of time and money if we provide an automatic method to effectively filter out most of the false positives,” said Hao Zhang, a study author from Shandong University.
Inspired by recent successes integrating machine-learning based methods to call genetic variants from NGS data, the Chinese research team introduced a deep-learning based variant filter. Dubbed DeepFilter, the filter is designed to effectively sift through false positive variants generated by VarDict while also ensuring high calling sensitivity.
DeepFilter treats the task of distinguishing whether a variant is true or false as a binary classification problem. The researchers used three types of datasets to train and test DeepFilter: real-world tumor-normal sample data, a mixture of two golden-standard data, and synthetic data.
The experimental results based on both synthetic and real-world NGS data were promising:
“DeepFilter outperformed other filters in terms of false positive variant filter tasks, which made VarDict more valuable in practical clinical research and greatly facilitated downstream analysis in biological research and patient treatment,” said Zhang.
The team plans to wade deeper into the problem of false-positive variant filtering, looking specifically at the positive and negative sample imbalance problem and incorporating other machine learning and deep-learning methods for filtering.
“Our ultimate goal is to solve the problem of running efficiency and accuracy of variation calling and provide a state-of-the-art variation detection tool,” said Yin.
This work was supported by the National Natural Science Foundation of China, the Shenzhen Basic Research Fund, the Key Project of Joint Fund of Shandong Province, Shandong Provincial Natural Science Foundation, and Engineering Research Center of Digital Media Technology, Ministry of Education, China.
Other contributors include Yanjie Wei from the Chinese Academy of Sciences, Bertil Schmidt from Johannes Gutenberg University and Weiguo Liu from Shandong University.
The paper is also available on SciOpen (https://www.sciopen.com/article/10.26599/TST.2022.9010032) by Tsinghua University Press.
About Tsinghua Science and Technology
Tsinghua Science and Technology (Tsinghua Sci Technol) started publication in 1996. It is an international academic journal sponsored by Tsinghua University and is published bimonthly. This journal aims at presenting the up-to-date scientific achievements in computer science, electronic engineering, and other IT fields. Tsinghua Science and Technology is indexed and abstracted in SCIE, EI, Scopus, Google Scholar, INSPEC, SA, Cambridge Abstract, CSCD, CNKI, etc. Contributions all over the world are welcome.
About Tsinghua University Press
Established in 1980, belonging to Tsinghua University, Tsinghua University Press (TUP) is a leading comprehensive higher education and professional publisher in China. Committed to building a top-level global cultural brand, after 41 years of development, TUP has established an outstanding managerial system and enterprise structure, and delivered multimedia and multi-dimensional publications covering books, audio, video, electronic products, journals and digital publications. In addition, TUP actively carries out its strategic transformation from educational publishing to content development and service for teaching & learning and was named First-class National Publisher for achieving remarkable results.
Tsinghua Science & Technology
DeepFilter: A deep learning based variant filter for VarDict
Article Publication Date