Public Release: 

The ties that bind: WPI researchers search for the hidden genetic code across species

A team led by Dmitry Korkin will use advanced math and computing power to sift through the genomes of animals, plants, fungi, and other organisms to find shared genetic sequences that may point to fundamental cellular functions

Worcester Polytechnic Institute


IMAGE: Dmitry Korkin, Ph.D., is associate professor of computer science at Worcester Polytechnic Institute (WPI). view more

Credit: Worcester Polytechnic Institute

Worcester, Mass. - If a human being, a worm, a broccoli plant, and a yeast cell share common genetic elements, those snippets of DNA, having remained unchanged over millions of years of evolutions, are likely to perform fundamental biological functions.

The National Science Foundation (NSF) has awarded Worcester Polytechnic Institute (WPI) a $768,000 research grant to identify such elements across all known genomes of plants, animals, fungi, and other complex organisms to gain insight into the roles they play in our cells. Dmitry Korkin, PhD, associate professor of computer science and principal investigator for the new project, will use mathematical algorithms and advanced computing technology to analyze vast amounts of genomic data to identify common genetic elements.

"We call these sequences long identical multispecies elements, or LIMEs," said Korkin. "To be conserved across species that diverged hundreds of millions years ago, these elements must carry out some very basic and vital functions in the cells."

Korkin is a member of WPI's Bioinformatics and Computational Biology Program, which uses advanced mathematics and computer science to shed light on basic biology. In the new project, Korkin's team will analyze all the available genomes of eukaryotes, which are organisms whose genetic material is contained within a nucleus. (Bacteria and other simple single-celled organisms do not have nuclei and are called prokaryotes.) Currently, the genomes of some 925 eukaryotic species are sufficiently sequenced for Korkin's analysis; they include many plants and animals, as well as the human genome.

"Just a few years ago, we could not even approach this question, because there was too much data to deal with," Korkin said. "With the technology we had then, the algorithms would have to run, literally, for a thousand years to get a result."

Korkin and his team have made technical leaps, developing new "cache-oblivious" algorithms that are designed not only to answer genetic questions, but also to maximize the efficiency of available computer processing power. "You have to understand the hardware you're running on to optimize the algorithms," Korkin said. "What we're seeing in early results is a thousand-fold improvement. What we were doing on big servers that took weeks, we can now do on a laptop in a couple of hours."

A genome is the complete set of DNA molecules that carry the genetic information needed for development and function of an organism. Famously dubbed "the double-helix", a DNA molecule looks like a twisted ladder with two side rails linked by pairs of only four nucleotides: adenine (A), cytosine (C), guanine (G), and thymine (T). Those four letters are the entire genetic alphabet. The microscopic worm C. elegans has about 100 million base pairs of A, C, G, and T in its genome, while the human genome runs to 3 billion base pairs.

Genes are large sequences of base pairs that provide specific instructions for production of proteins in cells. Genes that code for proteins, however, account for less than two percent of the DNA in human cells. For many years, the remaining 98 percent was called "junk DNA" and thought to be inactive leftovers built up from millennia of evolution. "We now know that it's really not junk at all," Korkin said. "Those non-coding regions of the genome are emerging as very important for basic development and regulatory functions."

Over the next three years, Korkin's team will work to identify identical (or nearly identical) patterns of base pairs that exist across species and develop some understanding of the evolutionary history of those genetic elements and their roles in normal development or the onset of disease. Korkin expects most of the LIMEs will fall in non-coding regions, given that those areas dominate the genome, but the project may also identify some common genes.


About Worcester Polytechnic Institute

Founded in 1865 in Worcester, Mass., WPI is one of the nation's first engineering and technology universities. Its 14 academic departments offer more than 50 undergraduate and graduate degree programs in science, engineering, technology, business, the social sciences, and the humanities and arts, leading to bachelor's, master's and doctoral degrees. WPI's talented faculty work with students on interdisciplinary research that seeks solutions to important and socially relevant problems in fields as diverse as the life sciences and bioengineering, energy, information security, materials processing, and robotics. Students also have the opportunity to make a difference to communities and organizations around the world through the university's innovative Global Perspective Program. There are now more than 45 WPI project centers in the Americas, Africa, Asia-Pacific, and Europe.

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.