One of the emerging, and soon to be defining, characteristics of science research is the collection, usage and storage of immense amounts of data. In fields as diverse as medicine, astronomy and economics, large datasets are becoming the foundation for new scientific advances.
A new project led by University of Notre Dame researchers will explore solutions to the problems of preserving data, analysis software and computational workflows, and how these relate to results obtained from the analysis of large datasets.
Titled "Data and Software Preservation for Open Science (DASPOS)," the National Science Foundation-funded $1.8 million program is focused on high energy physics data from the Large Hadron Collider (LHC) and the Fermilab Tevatron.
The research group, which is led by Mike Hildreth, a professor of physics, Jarek Nabrzyski, director of the Center for Research Computing with a concurrent appointment as associate professor of computer science and engineering, and Douglas Thain, associate professor of computer science and engineering, will also survey and incorporate the preservation needs of other research communities, such as astrophysics and bioinformatics, where large datasets and the derived results are becoming the core of emerging science in these disciplines.
"The program will include several international workshops and the design of prototype data and software preservation architecture that meets the functionality needed by the scientific disciplines," Hildreth said. "What is learned from building this prototype will inform the design and construction of the global data and software preservation infrastructure for the LHC, and potentially for other disciplines."
The multi-disciplinary DASPOS team includes particle physicists, computer scientists, and digital librarians from Notre Dame, the University of Chicago, the University of Illinois Urbana-Champaign, the University of Nebraska at Lincoln, New York University and the University of Washington.