Galaxy -- an open-source, web-based platform for data-intensive biomedical and genetic research -- is now available as a "cloud computing" resource. A team of researchers including Anton Nekrutenko, an associate professor of biochemistry and molecular biology at Penn State University; Kateryna Makova, an associate professor of biology at Penn State; and James Taylor from Emory University, developed the new technology, which will help scientists and biomedical researchers to harness such tools as DNA-sequencing and analysis software, as well as storage capacity for large quantities of scientific data. Details of the development will be published as a letter in the journal Nature Biotechnology. Earlier papers by Nekrutenko and co-authors describing the technology and its uses are published in the journals Genome Research and Genome Biology.
Nekrutenko said that he and his team first developed the Galaxy computing system (http://galaxyproject.
Now, Nekrutenko's team has taken Galaxy to the next level by developing an "in the cloud" option using, for example, the popular Amazon Web Services cloud. "A cloud is basically a network of powerful computers that can be accessed remotely without the need to worry about heating, cooling, and system administration. Such a system allows users, no matter where they are in the world, to shift the workload of software storage, data storage, and hardware infrastructure to this remote location of networked computers," Nekrutenko explained. "Rather than run Galaxy on one's own computer or use Penn State's servers to access Galaxy, now a researcher can harness the power of the cloud, which allows almost unlimited computing power." As a case study, the authors report on recent research published in Genome Biology in which scientists, with the help of Ian Paul, a professor of pediatrics at Penn State's Hershey Medical Center, analyzed DNA from nine individuals across three families using Galaxy Cloud. Thanks to the enormous computing power of the platform, the researchers were able to identify four heteroplasmic sites -- variations in mitochondria, the part of the genome passed exclusively from mother to child.
"Galaxy Cloud offers many advantages other than the obvious ones, such as computing power for large amounts of data and the ability for a scientist without much computer training to use DNA-analysis tools that might not otherwise be accessible," Nekrutenko said. "For example, researchers need not invest in expensive computer infrastructure to be able to perform data-intensive, sophisticated scientific analyses."
Yet another advantage of Galaxy Cloud is its data-storage capacity. Using the Amazon Web Services cloud, researchers have the option of storing vast amounts of data in a secure location. "There are emerging technologies that will produce 100 times more data than existing 'next-generation' DNA sequencing, which already has reached the point where even more storage becomes an issue, not to mention analysis," Nekrutenko said.
In addition to Nekrutenko, Makova, and Taylor, other authors of the research report include Nate Coraor and Hiroki Goto of the Center for Comparative Genomics and Bioinformatics at Penn State and Enis Afgan and Dannon Baker of the Department of Biology and the Department of Mathematics and Computer Science at Emory University. Galaxy Cloud development was supported, primarily, by the U.S. National Institutes of Health and the U.S. National Science Foundation. Additional funding was provided by the Pennsylvania Department of Health.
[ Katrina Voss ]
A high-resolution image associated with this research is online at http://www.
Image caption: The Galaxy platform helps researchers to analyze vast quantities of DNA-sequence data.
Credit: National Institutes of Health