News Release

Microarrays, key genome expression trackers, work better when probes are sequence-verified

Many widely used probes don't match latest RefSeq database information

Peer-Reviewed Publication

American Physiological Society

BETHESDA, Md. (July 22, 2004) -- Microarray technology, sometimes referred to as biochips, has been extensively used to investigate genome-wide expression patterns and has facilitated a revolution in the characterization of cellular regulation. In addition, comprehensive gene expression profiling shows great potential for human disease diagnostics.

For instance, multiple research groups have shown that microarray data can identify previously unappreciated molecular subtypes of lung cancer that differ in their prognoses. Unfortunately, poor reproducibility of results exists across studies.

Furthermore, there is now a tremendous volume of data, particularly from human clinical specimens, which can't be duplicated, so strategies to improve analysis of (that is, "clean up") existing data sets are needed. One limitation of the application of microarray technology could be due to the failure of similar studies to measure identical biological parameters. In other words, the problem could arise from the fact that many of the microarray probes – and there are now up to hundreds of thousands on a single slide – are often based on gene sequences that are five years old, or more.

Background

Frustrated by more than two years of trying to analyze microarray data contrasting two known conditions, researchers at Harvard Medical School and Washington University in St. Louis decided to look at the nucleotide sequences that measure gene expression on the most widely used commercial microarray technology. They found that in many cases they did not match the most current information.

In this study, they undertook a global analysis of the microarrays and systematically attempted to confirm the accuracy of individual probe sequences. They looked at every probe on the array to see if it corresponded with the gene that it was intended to measure. They found that an important percentage of the probe sequences -- sometimes as much as 20%, on both old and currently used platforms – didn't perfectly correspond with the appropriate mRNA as defined by the reference sequence (RefSeq).

Research at Harvard's Brigham & Women's Hospital

The study, entitled "Increased measurement accuracy for sequence-verified microarray probes," will appear in the August 2004 edition of Physiological Genomics, one of 14 journals published by the American Physiological Society.

Researchers Brigham H. Mecham, Daniel Z. Wetmore and Thomas J. Mariani worked in the Division of Pulmonary and Critical Care Medicine, Department of Medicine, Brigham and Women's Hospital (BWH) at Harvard Medical School, Boston; Zoltan Szallasi and Isaac Kohane were at the Children's Hospital Informatics Program of Harvard Medical School; and Yoel Sadovsky was at the Department of Obstetrics and Gynecology, Washington University School of Medicine, St. Louis, MO.

The work in this paper was supported by the Harvard Lung Biology Center, HL071885 (TJM), ES11597-01 (YS) and the Francis Families Foundation.

Results

The researchers found that there were many causes for the probe sequence inaccuracies, but most notably there has been constant improvement in sequence information databases over time. Regardless of the nature of probe sequence inaccuracies, the study clearly shows that sequence-verified probes perform more consistently, and with higher accuracy, within replicates and across different versions of the technology.

They note that the leading manufacturer of such microarrays "apparently…has come to the same conclusion and has recently released a platform containing RefSeq-verified probes."

Based on a comprehensive analysis of probe sequences on the 20 most common mammalian microarray platforms, the researchers found that data derived from verified probes showed greater accuracy than from unverified probes,

  • Between technical replicates
  • Across generations of same-platform technology
  • In comparisons between different technology platforms
  • When comparing patient-oriented data from multiple, independent diagnostic microarray studies.

After identifying the limitations of the probe sequences, they used this information to improve the application of the technology. On the diagnostic side, they tested the effects of probe sequence accuracy in data from two independent breast cancer expression profiling studies. Their results indicate that restricting data to sequence-verified probes can improve the diagnostic power of microarray technology.

Discussion and data availability

The researchers stress that the result did not address a particular classification scheme but indicated that removing unverified probe sets allowed for the major component of change to be related to the underlying biology (in this data set, breast cancer) as opposed to the source of the experiments.

"As combining data from multiple microarray platforms/technologies is certain to prove a common method, our results showing increased accuracy of sequence-verified probes across platforms (oligo vs. oligo and oligo vs. cDNA) substantiate the importance of using the most reliable information to verify equivalence of measurement across technologies," the researchers conclude.

The authors have created a website for checking sequences/measurements on microarrays for the 10 most common platforms, which probably will be going up to 26 relatively soon. Called the "Lung Transcriptome," it was designed and built by Brigham Mecham, B.S., and Thomas Mariani, Ph.D., to serve "as both a microarray data repository and source for information and analytical tools for functional genomics-based, pulmonary-focused research applications."

###

It can be found at http://lungtranscriptome.bwh.harvard.edu.

Source: Physiological Genomics, July 2004, one of 14 journals containing almost 4,000 articles annually, published by the American Physiological Society.

Editors note: A copy of the research paper by Mecham et al. is available in pdf format to the media. Members of the media are encouraged to obtain an electronic version and to interview members of the research team. To do so, please contact Mayer Resnick at APS 301-634-7209, cell 301-332-4402 or mresnick@the-aps.org.

The American Physiological Society was founded in 1887 to foster basic and applied bioscience. The Bethesda, Maryland-based society has more than 10,000 members and provides a wide range of research, educational and career support to further the contributions of physiology to understanding the mechanisms of diseased and healthy states.


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.