Public Release: 

Mouse genome revealing which DNA sequences activate human genes

DOE/Lawrence Berkeley National Laboratory

BERKELEY, CA -- The already high value of the mouse as a model for studying the human genome has been raised even higher with the results of a new study by researchers with the U.S. Department of Energy's Lawrence Berkeley National Laboratory (Berkeley Lab) and the University of California at San Francisco (UCSF).

In a paper published in the April 7 issue of the journal Science, the researchers report that comparative analysis techniques used to identify DNA sequences coding for genes in mice and humans can also be used to identify sequences that regulate the "expression" or activation of genes.

"You could call this finding jewels in junk DNA," says Edward Rubin, a geneticist with Berkeley Lab's Life Sciences Division, and co-leader, along with Kelly Frazer, of the study. "By comparing human and mouse sequences we can identify those segments of the genome that contain information which instructs surrounding genes on when and where they are to be active. Identifying these sorts of regulatory sequences using classical biological approaches is labor intensive and difficult."

Evolutionary conservation of non-coding DNA sequences that play an important role in regulating gene expression is the key to the success of this study, just as it has been a key to identifying DNA sequences that code for genes across different species.

"If evolution conserved a sequence over the 70-90 million years since mice and humans diverged, it likely has a function," says Frazer. "Whether its function is to determine the structure of a protein coded for by a gene or to regulate gene expression, we should be able to identify these sequences through mouse to human sequence comparisons."

In addition to Rubin and Frazer, other authors of the Science paper were Gabriela Loots and Cathy Blankespoor of Berkeley Lab, Richard Locksley and Zhi-En Wang of UCSF and the Howard Hughes Medical Institute, and Webb Miller at Penn State University.

As the various genome projects (including that of the human and the mouse) speed towards completion, scientists are already moving to the next phase which is identifying those sequences of DNA base-pairs that have critical functions. Using computers to compare sequences that code for a known gene in mice with sequences in human DNA has proven to be an effective means of identifying the human gene. Rubin and his research group have used this approach to help them identify genes linked to Down syndrome, atherosclerosis, and, most recently, asthma.

However, approximately 95 percent of the sequences in the human genome do not code for genes. Once labeled as "junk" DNA, it has long been known that some of these sequences have important duties including the regulation of gene expression. It is also believed that these non-coding sequences have been conserved between related species such as mice and humans, just like sequences that code for genes.

To search for conserved non-coding sequences (CNSs), Rubin, Frazer, and their colleagues examined a stretch of DNA about a million base-pairs in length from mice and humans that contained the same 23 genes in both species, including three interleukin genes (IL-4, IL-13, and IL-5). Previous studies indicated that these interleukin genes are similarly regulated and that their regulatory sequences may be conserved in mice and humans.

The Berkeley researchers looked for CNSs that were at least 70 percent identical in both species over at least 100 base-pairs. Of the 90 CNSs they identified that met this criteria, the researchers took 15 and did a cross-species sequence analysis which also included DNA from a cow, a dog, a pig, a rabbit, a rat, a chicken, and a fish. Most of these elements were also found to be present in the other mammals, indicating that they most likely have been conserved because they perform an important biological function.

The cross-species sequence analysis was followed by an in-depth functional analysis of the largest of the 15 sequences, CNS-1, which encompasses 401 base-pairs and is located between IL-4 and IL-13. Biological properties of CNS-1 were characterized through multiple lines of transgenic mice and revealed CNS-1 to be a "coordinate regulator" of the three interleukin genes, activating them by modulating the structure of chromatin. There is, the authors state, no standard in-vitro assay that could have been used to make this determination. "What is unique about our study is that we were lead to the interleukin regulatory element CNS-1 entirely by computational analysis of mouse and human sequences," says Rubin. "Since we are soon to have the entire genomes of mice and humans sequenced, our study demonstrates one successful strategy of interpreting the sequence information coming from the genome program into meaningful biology."

Berkeley Lab is a U.S. Department of Energy national laboratory located in Berkeley, California. It conducts unclassified scientific research and is managed by the University of California. Visit our Web site at


MEDIA CONTACT INFORMATION: Dr. Edward M. Rubin heads Berkeley Lab's Genome Sciences Department and also leads functional genomics program at DOE's Joint Genome Institute. He can be reached by phone at 510-486-5072 or by e-mail at

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.