The Earth System Grid Federation, a multi-agency initiative that gathers and distributes data for top-tier projections of the Earth’s climate, is preparing a series of upgrades that will make using the data easier and faster while improving how the information is curated.
The federation, led by the Department of Energy’s Oak Ridge National Laboratory in collaboration with Argonne and Lawrence Livermore national laboratories, is integral to some of the most important, impactful and widely respected projections of the Earth’s future climate: those made by scientists working with the Coupled Model Intercomparison Projects for the World Climate Research Programme.
“ESGF data are about the future of life on Earth,” Hoffman said. “By providing scientists easy access to the full collection of international models, ESGF enables them to make the very best guess about the future trajectory of our climate.”
A key ESGF mission is to support the data needs of scientists who prepare the United Nations Intergovernmental Panel on Climate Change’s comprehensive climate assessments released every six to seven years. ESGF data underpin IPCC landmark reports such as the recent Sixth Assessment Report, AR6, and its working group findings. The data also informs IPCC special reports focused on climate vulnerabilities, adaptation scenarios and mitigation strategies.
Another important aspect of the ESGF’s mission is to ensure that scientific investigation is transparent, collaborative and reproducible, given its direct impact on worldwide climate research and potential use in decisionmaking.
“All of the Earth system model data that go into the IPCC reports and all of the most important simulations of the climate from around the world are stored in the ESGF and made accessible by the services we provide,” said Forrest Hoffman, lead for ESGF and the Computational Earth Sciences group at ORNL. “The federation gets data into the hands of the tens of thousands of researchers who analyze it and compare it with observational data to constantly update our best projections of the future.”
In a new iteration of the ESGF project, computational scientists are working to improve data discovery, access and storage. The work will rely on the latest software tools, cloud computing resources, the world’s most powerful supercomputers and DOE’s Energy Sciences Network, or ESnet. ESnet currently enables 100 gigabit-per-second transfer rates among national laboratories and connections to national and international universities and research centers. An upgrade expected by the end of year will boost ESnet transfer rates to as much as 400 Gbps.
The federation operates as a network of large computer nodes hosted in the United States and 17 other countries, functioning in tandem as one large data archive. ORNL, ANL and LLNL are working to improve the reliability and scalability of the system, providing a smooth data replication process that ensures the broader scientific community has access to data from all ESGF partners. ORNL and ANL will also host a dual backup of the more than 10 petabytes (and counting) of ESGF cumulative data and models, taking advantage of the world-class computing systems hosted at the labs.
Developing robust user interfaces and secure, reliable archives
The multiyear upgrade project has already replicated existing data and is providing the storage and computational services needed to dynamically generate data for the user community while it builds out new infrastructure and services. ESGF has created a roadmap to guide its development work.
ORNL brings substantial experience with big data centers and large-scale modeling and simulation to its leadership role in ESGF. The lab is home to the Oak Ridge Leadership Computing Facility, a DOE Office of Science user facility whose Frontier exascale computing system was recently ranked as the world’s fastest, as well as the Climate Change Science Institute, which brings together data experts, modelers and experimentalists to accelerate understanding of climate change and its impacts.
“ORNL is in the unique position of knowing about big data and also knowing about climate and serving as host to very large data centers and the interfaces that make that information easily accessible by scientists around the world,” Hoffman said.
The Argonne Leadership Computing Facility lends its unique capabilities, as well as the Globus research data management system, operated for the research community by the University of Chicago, to the federation. Globus services will be used in the upgraded ESGF for authentication and for data indexing, access and replication.
“The terabytes and petabytes generated by the climate models of today require new approaches to data management and analysis,” said Ian Foster, ANL lead for the project. “We will enable not only faster download of data subsets, but also previously infeasible data analyses on ANL and ORNL supercomputers.” ALCF is a DOE Office of Science user facility.
Lawrence Livermore also brings a wealth of high-performance computing and data center expertise and capabilities, creative technologies and software solutions to ESGF, plus its experience as the initial lead of the ESGF.
“The upgrades will make it easier and faster for users to access the data that can help us better understand what climate will look like in the future,” said Sasha Ames, LLNL lead for the federation.
ESGF is sponsored by the Biological and Environmental Research program within DOE’s Office of Science and co-funded by the National Aeronautics and Space Administration, the National Oceanic and Atmospheric Administration and the National Science Foundation, with the support of numerous international laboratory and academic partners.
UT-Battelle manages ORNL for the Department of Energy’s Office of Science, the single largest supporter of basic research in the physical sciences in the United States. The Office of Science is working to address some of the most pressing challenges of our time. For more information, please visit energy.gov/science.