WHEN a sudden outbreak of a strange virus called Severe Acute Respiratory Syndrome (SARS) occurred last year, the Centers for Disease Control and Prevention (CDC) sought help from a team of Lawrence Livermore biologists, mathematicians, and computer scientists. Within three hours of receiving the first sequenced genome (genetic blueprint) of the virus from the CDC, the Livermore team produced several candidate signatures of the pathogen (disease-causing microbe). Signatures are specific regions of DNA or RNA that uniquely identify a pathogen. The SARS case was one of many in which the group has developed signatures using a novel whole-genome analysis approach that is changing pathogen diagnostic design.
The team, part of the Laboratory's Biology and Biotechnology Research Program (BBRP), has been on the front lines of the nation's biodefense effort since 2001. Eleven computer scientists, biologists, and mathematicians led by computer scientist Tom Slezak comprise one of the largest pathogen bioinformatics groups. Their work spans the full spectrum of effort, from identifying signature candidates to developing DNA-based signatures and deploying validated assays in the field. Team members have traveled throughout the nation, often with only a few hours' notice, to support the national effort to defend against bioterrorism.
Biological weapons could include bacteria (anthrax, plague), DNA viruses (smallpox), RNA viruses (ebola, SARS, foot-and-mouth disease), fungi (soybean rust, corn rust), protozoa (giardiasis), and toxins (ricin). Pathogens such as these and many others could be used to sicken or kill urban populations, livestock, or crops. Early detection and unmistakable identification are crucial to limiting the potentially catastrophic human and economic costs of a bioattack.
Many types of signature requests are received by the team. One request may be for all strains of a normally pathogenic species, including its nonpathogenic and vaccine strains. Another request may be for all of the pathogenic strains of a particular species. Fulfilling these requests can be difficult because while there may be hundreds of strains of a particular species, genomic sequences may exist for only a few. Strains may also vary in pathogenicity, and their genetic near-neighbors may or may not be virulent or may affect hosts other than humans. In addition, RNA viruses have extremely high mutation rates, so it may be difficult or impossible to find adequate stable regions suitable for use as a signature.
The Livermore bioinformatics team has developed DNA-based signatures of virtually every biothreat pathogen (the organisms identified by the CDC as high-priority threat agents) for which adequate genomic sequences are available as well as for several other human and livestock pathogens. Signature requests come from agencies such as the Department of Energy (DOE), the CDC's Laboratory Response Network and BioWatch Program, the Department of Agriculture, the Food and Drug Administration, and the Department of Defense. Livermore signatures are part of the nation's public health system and have been in use for homeland defense since fall 2001.
Pipeline Called KPATH
Livermore's signature pipeline, called KPATH, is used to develop the signatures of bacterial and viral pathogens. This Livermore-designed system is a fully automated DNA-based signature "pipeline," able to deliver signature candidates (spanning 200–300 base pairs of DNA) in minutes to hours. In simplest terms, KPATH works by comparing the genome of the target pathogen to a library of microbial genomes, searching for those areas that are unique to the target organism.
KPATH uses the software programs Multiple Genome Aligner (MGA) and Vmatch, which were developed by collaborators in Germany. MGA aligns the multiple genomes of a target pathogen, and Vmatch uses efficient algorithms to quickly compare the genome of interest with all other sequenced microbial genomes. "These software tools allow the pathogen genomes themselves to show us which regions of DNA are important," Slezak says. The DNA regions that are significant to the pathogen are conserved among all strains of the pathogen sequenced to date and are unique when compared to all other organisms sequenced to date. That is, they are present in every strain of the pathogen and absent in all other organisms.
The algorithms work by locating those portions of the genome that are not unique and eliminating them from consideration. "In this way," says Slezak, "we define regions of apparent uniqueness and mine them for candidate signatures."
Candidate signatures must then be verified in the laboratory. "It's a long path from candidate signature to validated assay," notes Slezak. Hundreds of thousands of candidate signatures are computationally screened. Wet-chemistry procedures reduce that number to hundreds and then dozens. Much of the laboratory testing takes place at the CDC and other organizations that are certified to work with virulent pathogens. Once a signature is verified, the final step is optimizing the signature for a specific detection chemistry or instrument using a specific protocol. When that process is complete, the signature is called an assay.
One of KPATH's important features is that it automatically downloads newly sequenced pathogen genomes from all major public databases, and all validated and fielded assays are verified weekly as the new sequence data are acquired. "As known strains evolve and new strains are discovered and their genomes sequenced, some of the 'unique' regions will erode," says Slezak. "We'll then need to refine the signature."
Olympic Games Motivate
In early 2000, the DOE's Chemical and Biological National Security Program (CBNP) began a national pathogen-detection effort following the announcement by then-Secretary Richardson that DOE would be providing biosecurity at the 2002 Winter Olympic Games in Salt Lake City, Utah. Lawrence Livermore was assigned the task of developing reliable and validated assays for a number of the most likely bioterrorism agents.
The bioinformatics team reasoned that a whole-genome analysis approach--that is, comparing a target pathogen genome against all other sequenced microbial genomes--would reveal which regions of the DNA were unique. They also believed the process could be automated to get results more quickly. Until the Livermore approach, signature design was a time-consuming, expensive process done largely by hand and guided heavily by intuition. Analysis was generally limited to sequences from a few genes thought to be important. Traditional approaches to DNA-based signature development started with the assumption that a particular gene was vital to an organism's virulence, host range, or other factor. The resulting assay would then be tested with the available strain. This approach would at times yield good results, but it frequently resulted in failure. Computational support for diagnostic development was rare. "The time was ripe for radical changes in this field," says Slezak.
In August 2000, the team began building a set of tools that would accomplish these goals. Slezak says, "We used techniques and mindsets from our many years of experience working on the Human Genome Project (HGP)." Slezak formerly led Livermore's HGP bioinformatics effort and later the Joint Genome Institute's informatics effort. BBRP scientist Paula McCready led Livermore's HGP sequencing effort for several years and was the first leader of the sequencing effort at DOE's Joint Genome Institute.
The concept of an automated signature-design system began with a crude algorithm and a proof-of-principle test that took about one week. The goal was to develop a signature for Bacillus anthracis, the bacterium that causes anthrax. When the test proved successful, Slezak began to search for more efficient algorithms.
"In October 2000," says Slezak, "we began building a preliminary pipeline based on this approach with funding obtained from the Laboratory Directed Research and Development Program. In May 2002, we were funded by DOE to build the current KPATH pipeline. We continued to use the first pipeline until about January 2003, when KPATH was shown to be functionally equivalent and much faster."
New Features on the Horizon
The bioinformatics team is developing additional features for KPATH, including the capability to generate multiple types of signatures to support different detection chemistries and machines, better algorithms to improve processing efficiencies, and improved capabilities for developing signatures of RNA viral genomes. Signature development of RNA viruses is particularly difficult because they mutate so rapidly. To overcome this difficulty, the team is building a pipeline of protein signatures--the other major approach to pathogen detection.
Protein signatures are commonly used in diagnostic kits, such as commercially available home-use pregnancy tests. Slezak notes that the sequence of amino acids that make up a protein tends to be conserved (unchanged) because altering the protein sequence is likely to change the protein's shape, which in turn would alter its function. Using this approach, the team has found conserved and unique signature regions in the glycoprotein of the West Nile virus (an RNA virus) and has mapped these regions to three-dimensional protein structure models created by Livermore mathematician Adam Zemla. Antibodies derived from these regions are being tested at the University of California at Davis to verify that the identified regions are indeed unique.
The team also plans to develop fungi signatures. However, fungal DNA is more difficult to analyze than bacterial DNA because of the larger genome size of fungi. Because of funding constraints, only a few fungi genomes have been sequenced. As a result, it is difficult to know what genomic regions are common to fungi and thus are not useful for signatures.
Another ongoing task for the team is building and maintaining relationships with partners in various agencies and universities. Slezak explains, "Much of the data we need are not in the public domain."
In the meantime, researchers worldwide are regularly publishing the sequences of new and updated genomes. As additional pathogens are sequenced, the Livermore team will continue to provide rapid computational analysis and develop DNA-based and protein signatures to help thwart bioterrorists.
The Department of Energy's Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time.