A protein structure, predicted at ORNL (top) and the actual structure, determined experimentally (bottom). Click here for more photos.
Supercomputers are being used at ORNL to increase our knowledge about the structure
and function of genes and proteins in living cells.
Analyzing Genomes Computationally
In 2001 scientists using supercomputers suggested we should say goodbye to some
common beliefs in biology. No longer was it considered true that the human genome has
100,000 genes, that each gene makes only one protein, and that humans and bacteria have
entirely different genes in their cells.
These tenets were tossed out in response to findings of the International Human Genome
Sequencing Consortium, including the Department of Energy’s Joint Genome Institute
(JGI), to which ORNL contributes computational analysis. On February 15, 2001, the
consortium published the paper “Initial Sequencing and Analysis of the Human Genome”
in the journal Nature. The paper states that the human genome has “about 30,000 to
40,000 protein-coding genes, only about twice as many as in a worm or fly”; each gene
codes for an average of three proteins; and hundreds of genes may have been transferred
from bacteria to human genes.
Ed Uberbacher, head of the Computational Biology Section in ORNL’s Life Sciences
Division (LSD), was one of the hundreds of authors who contributed to this landmark
paper. Using the IBM RS/6000 SP supercomputer (Eagle) at ORNL, he and his ORNL
colleagues performed computational analysis and annotation of the human genome to
uncover evidence of the existence of genes about which little or nothing was
known—until this study. To perform their analysis, Uberbacher et al. used the latest
version of the Gene Recognition and Analysis Internet Link (GRAIL), which was
developed by Uberbacher and others in 1990 at ORNL and was rewritten as GrailEXP
for parallel supercomputers. Use of GrailEXP helped provide evidence for alternative
splicing—different ways of combining a gene’s protein-coding regions (exons) to produce
variants of the complete protein. The evidence suggests that some genes when expressed
produce up to 10 different protein products.
Researchers in LSD’s Computational Biology Section have
identified many genes in bacterial, mouse, and human genomes. For
the JGI they have created and used assembly programs and analysis
tools to produce draft sequences of the 300 million DNA base pairs
in chromosomes 19, 16, and 5. They have analyzed 25 complete
microbial genomes and many JGI draft microbial genomes.
The section’s researchers have written algorithms and developed
other tools that make it easier for biologists to use computers to find
genes and make sense out of the rising flood of biological data.
Through ORNL’s popular, user-friendly Genome Channel Web site
(150,000 sessions per month) and its Genomic Integrated
Supercomputing Toolkit (developed by ORNL’s Phil LoCascio and commonly called
GIST), the international biology community, including pharmaceutical industry
researchers, have readily obtained meaningful interpretations of their DNA sequences.
With help from its supercomputers, ORNL is on the genome analysis map.
Computationally Predicting Protein Structures
In the summer of 2000, an LSD group led by Ying Xu participated in an international
competition to predict the three-dimensional (3D) structures of 43 proteins, using
computational tools. Of the 123 groups competing in the fourth Critical Assessment of
Techniques for Protein Structure Prediction competition, this group placed sixth, putting
ORNL in the top 4% and placing it ahead of all other DOE national laboratories in the
The actual structures of the 43 target proteins had been
determined experimentally by using nuclear magnetic resonance
spectroscopy and X-ray crystallography. The computational
groups were provided with the identity and order of amino
acids making up each protein and the length of the
one-dimensional amino-acid sequence. Their predicted
structures (obtained in a few weeks) were compared with the
experimentally determined structures (obtained in about a year).
Protein structure is the key to protein behavior. Because the
function of a protein is related to its shape, it is essential to find
or predict correctly the 3D structures of proteins that make us
ill or keep us well. Using the details of a protein’s shape, a
chemical compound can be custom designed to fit precisely in
the protein, like a hand in a glove, blocking or enhancing the
protein’s activity. In this way, a highly effective drug with no side effects could be
created for an individual.
To speed up drug development, the goal is to predict computationally the structures of
100,000 proteins by aligning different amino-acid sequences along 1000 unique structural
folds that are being determined experimentally.
ORNL researchers will soon be predicting 100 protein structures a day and evaluating
which potential drug molecules dock well with specific proteins by running various
automated tools on the Eagle supercomputer. One of those tools is PROSPECT, the
Laboratory’s copyrighted protein-threading computer program that brought the group a
high world ranking and an R&D 100 Award in 2001. It is giving ORNL good prospects in
a field that could shape future health care.
The Department of Energy's Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time.