Investigators at Nationwide Children's Hospital have developed an analysis "pipeline" that slashes the time it takes to search a person's genome for disease-causing variations from weeks to hours. An article describing the ultra-fast, highly scalable software was published in the latest issue of Genome Biology.
"It took around 13 years and $3 billion to sequence the first human genome," says Peter White, PhD, principal investigator and director of the Biomedical Genomics Core at Nationwide Children's and the study's senior author. "Now, even the smallest research groups can complete genomic sequencing in a matter of days. However, once you've generated all that data, that's the point where many groups hit a wall. After a genome is sequenced, scientists are left with billions of data points to analyze before any truly useful information can be gleaned for use in research and clinical settings."
To overcome the challenges of analyzing that large amount of data, Dr. White and his team developed a computational pipeline called "Churchill." By using novel computational techniques, Churchill allows efficient analysis of a whole genome sample in as little as 90 minutes.
"Churchill fully automates the analytical process required to take raw sequence data through a series of complex and computationally intensive processes, ultimately producing a list of genetic variants ready for clinical interpretation and tertiary analysis," Dr. White explains. "Each step in the process was optimized to significantly reduce analysis time, without sacrificing data integrity, resulting in an analysis method that is 100 percent reproducible."
The output of Churchill was validated using National Institute of Standards and Technology (NIST) benchmarks. In comparison with other computational pipelines, Churchill was shown to have the highest sensitivity at 99.7 percent; highest accuracy at 99.99 percent and the highest overall diagnostic effectiveness at 99.66 percent.
"At Nationwide Children's we have a strategic goal to introduce genomic medicine into multiple domains of pediatric research and healthcare. Rapid diagnosis of monogenic disease can be critical in newborns, so our initial focus was to create an analysis pipeline that was extremely fast, but didn't sacrifice clinical diagnostic standards of reproducibility and accuracy" says Dr. White. "Having achieved that, we discovered that a secondary benefit of Churchill was that it could be adapted for population scale genomic analysis."
By examining the computational resource use during the data analysis process, Dr. White's team was able to demonstrate that Churchill was both highly efficient (>90 percent resource utilization) and scaled very effectively across many servers. Alternative approaches limit analysis to a single server and have resource utilization as low as 30 percent. This efficiency and capability to scale enables population-scale genomic analysis to be performed.
To demonstrate Churchill's capability to perform population scale analysis, Dr. White and his team received an award from Amazon Web Services (AWS) in Education Research Grants program that enabled them to successfully analyze phase 1 of the raw data generated by the 1000 Genomes Project - an international collaboration to produce an extensive public catalog of human genetic variation, representing multiple populations from around the globe. Using cloud-computing resources from AWS, Churchill was able to complete analysis of 1,088 whole genome samples in seven days and identified millions of new genetics variants.
"Given that several population-scale genomic studies are underway, we believe that Churchill may be an optimal approach to tackle the data analysis challenges these studies are presenting," says Dr. White.
The Churchill algorithm was licensed to Columbus-based GenomeNext LLC, which has built upon the Churchill technology to develop a secure and automated software-as-a-service platform that enables users to simply upload raw whole-genome, exome or targeted panel sequence data to the GenomeNext system and run an analysis that not only identifies genetic variants but also generates fully annotated datasets enabling filtering and identification of pathogenic variants. The company provides genomic data analysis solutions that simplify the process of data management and automate analysis of large scale genomic studies. The system was also developed with the research and clinical market in mind, offering a standardized pipeline that is well suited to settings where customers have to meet regulatory requirements.
Kelly BJ, Fitch JR, Hu Y, Corsmeier DJ, Zhong H, Wetzel AN, Nordquist RD, Newsom DL, White P. Churchill: an ultra-fast, deterministic, highly scalable and highly balanced parallelization strategy for the discovery of human genetic variation in clinical and population-scale genomics. Genome Biology 2015, 16:6. doi:10.1186/s13059-014-0577-x
About Dr. Peter White
Peter White, PhD is a principal investigator in the Center for Microbial Pathogenesis at The Research Institute at Nationwide Children's Hospital and an Assistant Professor of Pediatrics at The Ohio State University. He is Director of the Biomedical Genomics Core, a nationally recognized microarray and next-generation sequencing facility assisting numerous investigators in the design, production and analysis of genomics data. He is also Director of Molecular Bioinformatics, serving on the Research Computing Executive Governance Committee. His research program focuses on high performance computing solutions for "big data" and the discovery of human genetic variation associated with diseases. Dr. White received his PhD in Molecular Biology from the University of Cambridge, England and completed his postdoctoral training in the Department of Genetics at The University of Pennsylvania, Philadelphia. He has over 15 years of experience in the field of genomics and computational biology and has authored over 50 peer reviewed publications.
About Nationwide Children's Hospital
Ranked 7th of only 10 children's hospitals on U.S. News & World Report's 2014-15 "America's Best Children's Hospitals Honor Roll" and among the Top 10 on Parents magazine's 2013 "Best Children's Hospitals" list, Nationwide Children's Hospital is one of the nation's largest not-for-profit freestanding pediatric healthcare networks providing care for infants, children and adolescents as well as adult patients with congenital disease. As home to the Department of Pediatrics of The Ohio State University College of Medicine, Nationwide Children's faculty train the next generation of pediatricians, scientists and pediatric specialists. The Research Institute at Nationwide Children's Hospital is one of the Top 10 National Institutes of Health-funded free-standing pediatric research facilities in the U.S., supporting basic, clinical, translational and health services research at Nationwide Children's. The Research Institute encompasses three research facilities totaling 525,000 square feet dedicated to research. More information is available at NationwideChildrens.org/Research.
GenomeNext is a genomic informatics company dedicated to accelerating the promise and capability of predictive medicine and scientific discovery. We commercialize genomic analysis tools and integrated systems for the evaluation of genetic variation and function. Our advanced informatics and data management solutions are designed to simplify, expedite and enhance genetic analysis workflows. Our solutions provide the market with genomic data and analysis at an unprecedented combination of performance, quality, cost and scale without requiring the investment in high-performance computing resources and specialized personnel. Our proprietary platforms address a broad range of highly interconnected markets, including sequencing, genotyping, gene expression, and molecular diagnostics. Our customers include leading genomic research centers, academic institutions, government laboratories, and clinical research organizations, as well as pharmaceutical, biotechnology, agrigenomics, and consumer genomics companies.