Feature Story | 30-Apr-2002

Probing cells by computer

DOE/Oak Ridge National Laboratory

A protein structure, predicted at ORNL (top) and the actual structure, determined experimentally (bottom).
Click here for more photos.

Supercomputers are being used at ORNL to increase our knowledge about the structure and function of genes and proteins in living cells.

Analyzing Genomes Computationally

In 2001 scientists using supercomputers suggested we should say goodbye to some common beliefs in biology. No longer was it considered true that the human genome has 100,000 genes, that each gene makes only one protein, and that humans and bacteria have entirely different genes in their cells.

These tenets were tossed out in response to findings of the International Human Genome Sequencing Consortium, including the Department of Energy’s Joint Genome Institute (JGI), to which ORNL contributes computational analysis. On February 15, 2001, the consortium published the paper “Initial Sequencing and Analysis of the Human Genome” in the journal Nature. The paper states that the human genome has “about 30,000 to 40,000 protein-coding genes, only about twice as many as in a worm or fly”; each gene codes for an average of three proteins; and hundreds of genes may have been transferred from bacteria to human genes.

Ed Uberbacher, head of the Computational Biology Section in ORNL’s Life Sciences Division (LSD), was one of the hundreds of authors who contributed to this landmark paper. Using the IBM RS/6000 SP supercomputer (Eagle) at ORNL, he and his ORNL colleagues performed computational analysis and annotation of the human genome to uncover evidence of the existence of genes about which little or nothing was known—until this study. To perform their analysis, Uberbacher et al. used the latest version of the Gene Recognition and Analysis Internet Link (GRAIL), which was developed by Uberbacher and others in 1990 at ORNL and was rewritten as GrailEXP for parallel supercomputers. Use of GrailEXP helped provide evidence for alternative splicing—different ways of combining a gene’s protein-coding regions (exons) to produce variants of the complete protein. The evidence suggests that some genes when expressed produce up to 10 different protein products.

Researchers in LSD’s Computational Biology Section have identified many genes in bacterial, mouse, and human genomes. For the JGI they have created and used assembly programs and analysis tools to produce draft sequences of the 300 million DNA base pairs in chromosomes 19, 16, and 5. They have analyzed 25 complete microbial genomes and many JGI draft microbial genomes.

The section’s researchers have written algorithms and developed other tools that make it easier for biologists to use computers to find genes and make sense out of the rising flood of biological data. Through ORNL’s popular, user-friendly Genome Channel Web site (150,000 sessions per month) and its Genomic Integrated Supercomputing Toolkit (developed by ORNL’s Phil LoCascio and commonly called GIST), the international biology community, including pharmaceutical industry researchers, have readily obtained meaningful interpretations of their DNA sequences. With help from its supercomputers, ORNL is on the genome analysis map.

Computationally Predicting Protein Structures

In the summer of 2000, an LSD group led by Ying Xu participated in an international competition to predict the three-dimensional (3D) structures of 43 proteins, using computational tools. Of the 123 groups competing in the fourth Critical Assessment of Techniques for Protein Structure Prediction competition, this group placed sixth, putting ORNL in the top 4% and placing it ahead of all other DOE national laboratories in the contest.

The actual structures of the 43 target proteins had been determined experimentally by using nuclear magnetic resonance spectroscopy and X-ray crystallography. The computational groups were provided with the identity and order of amino acids making up each protein and the length of the one-dimensional amino-acid sequence. Their predicted structures (obtained in a few weeks) were compared with the experimentally determined structures (obtained in about a year).

Protein structure is the key to protein behavior. Because the function of a protein is related to its shape, it is essential to find or predict correctly the 3D structures of proteins that make us ill or keep us well. Using the details of a protein’s shape, a chemical compound can be custom designed to fit precisely in the protein, like a hand in a glove, blocking or enhancing the protein’s activity. In this way, a highly effective drug with no side effects could be created for an individual.

To speed up drug development, the goal is to predict computationally the structures of 100,000 proteins by aligning different amino-acid sequences along 1000 unique structural folds that are being determined experimentally.

ORNL researchers will soon be predicting 100 protein structures a day and evaluating which potential drug molecules dock well with specific proteins by running various automated tools on the Eagle supercomputer. One of those tools is PROSPECT, the Laboratory’s copyrighted protein-threading computer program that brought the group a high world ranking and an R&D 100 Award in 2001. It is giving ORNL good prospects in a field that could shape future health care.

###

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.