Cold Spring Harbor, NY -- In our daily lives, clutter is something that gets in our way, something that makes it harder for us to accomplish things. For doctors and scientists trying to parse mountains of raw biological data, clutter is more than a nuisance; it can stand in the way of figuring out how best to treat someone who is very sick.
Using increasingly cheap and rapid methods to read the billions of "letters" that comprise human genomes - including the genomes of individual cells sampled from cancerous tumors -- scientists are generating far more data than they can easily interpret.
Today, two scientists from Cold Spring Harbor Laboratory (CSHL) publish a mathematical method of simplifying and interpreting genome data bearing evidence of mutations, such as those that characterize specific cancers. Not only is the technique highly accurate; it has immediate utility in efforts to parse tumor cells, in order to determine a patient's prognosis and the best approach to treatment.
CSHL Assistant Professor Alexander Krasnitz, who developed the new technique jointly with American Cancer Society Professor Michael Wigler, explains that it reduces the burden of interpretation by identifying what he and Wigler call COREs, an acronym for "cores of recurrent events."
krasznitz_diagram2013 When genome sequence data from 100 cells sampled from a single human tumor is analyzed, and the mathematical algorithm devised by Krasnitz and Wigler is applied, the rich structure of the data emerges. This is a "heat map" in which each horizontal row contains data from 1 of the 100 sampled cells; and each vertical column contains information about the presence (black) or absence (no mark) of a "CORE." Each core represents a place in the genome of a particular cell that either has amplified DNA (blue bar, top) or deleted DNA (red bar, top). From the mass of data underlying these phenomena, signatures of 4 subpopulations of tumor cells now become visible. The four groups and their evolutionary relation is shown along the left vertical axis: about half are "green," and are normal; the red group -- consisting of only 4 cells of the 100, turns out, genetically, to be the most mutated and dangerous subgroup in this tumor.
Consider the example of a cancerous breast tumor. Central to the CORE concept is what Krasnitz and Wigler refer to as "intervals." An example of an interval would be a segment of DNA that is missing in the genetic sequence of one or more cells sampled from the tumor. Tumor cells are often missing DNA that should normally be present; or conversely, they often have genome intervals in which the normal DNA sequence is amplified - it appears in multiple copies. Such deletions and amplifications are called copy-number variations, or CNVs.
"In cancer," says Krasnitz, "we find intervals in the genome that are hit again and again. You might see this in many cells coming from a single patient's tumor; or you may see these repeating patterns in cells sampled from many patients with a similar cancer type."
In either case, if you superimpose the location of each "hit" - whether a deletion or an amplification of DNA -- against a map of the full human genome, "you end up with these wobbly pile-ups, stacks of 'hits' at the same locations in the genome."
Due to the vagaries of collecting genome data and a certain amount of small-scale variation in the precise boundaries of the deleted or amplified DNA intervals, the stacks don't line up straight; as Krasnitz says, they look "wobbly." This makes them very hard to accurately interpret.
The CORE method he and Wigler describe in a paper appearing in Proceedings of the National Academy of Sciences "is a mathematical way of cleaning up this mess and untangling these stacks of data, which often overlap." When data from 100 cells from a single tumor are analyzed, for example, and the mathematical algorithm devised by Krasnitz and Wigler is applied, the regularity of the stacks is revealed, and the rich structure of the data emerges.
In the example of analyzing 100 cells from one tumor, the net result is that populations and subpopulations of cancer cells can be distinguished; and if the cancer has already become metastatic, CORE will be useful in discerning the relations among cancer cell subpopulations in various parts of the body. Such analysis is a potentially valuable guide to prognosis and can also help to make important treatment decisions.
"Target inference from collections of genomic intervals" appears online today ahead of print in Proceedings of the National Academy of Sciences. The authors are: Alexander Krasnitz, Guoli Sun, Peter Andrews and Michael Wigler. The paper can be obtained online at: http://www.
About Cold Spring Harbor Laboratory
Founded in 1890, Cold Spring Harbor Laboratory (CSHL) has shaped contemporary biomedical research and education with programs in cancer, neuroscience, plant biology and quantitative biology. CSHL is ranked number one in the world by Thomson Reuters for impact of its research in molecular biology and genetics. The Laboratory has been home to eight Nobel Prize winners. Today, CSHL's multidisciplinary scientific community is more than 360 scientists strong and its Meetings & Courses program hosts more than 12,500 scientists from around the world each year to its Long Island campus and its China center. Tens of thousands more benefit from the research, reviews, and ideas published in journals and books distributed internationally by CSHL Press. The Laboratory's education arm also includes a graduate school and programs for undergraduates as well as middle and high school students and teachers. CSHL is a private, not-for-profit institution on the north shore of Long Island. For more information, visit http://www.