It takes a global village to monitor and analyze trends in Earth's "breathing"--or the exchange of carbon dioxide, water vapor and energy between vegetation on the ground and the planet's atmosphere.
Today, hundreds of science groups in multiple countries have planted more than 500 micrometeorological towers across five continents to monitor these exchanges every 30 minutes. Each group gathers, stores, and sorts its data in its own way. Yet, in order to accurately track and predict long-term trends in climate--on local, regional and global scales--researchers need to bring together and navigate through these thousands of disparate datasets.
"I ultimately want understand how biophysical factors regulate carbon and water dynamics in terrestrial ecosystems at multiple spatial and temporal scales. Thus, to look at this issue, it is not enough to just collect data from one site or tower and here is where FLUXNET (the global consortium of eddy covariance towers) provide an invaluable opportunity to the research community," says Rodrigo Vargas, a UC Berkeley researcher.
He notes that until recently, one of the biggest roadblocks to his research has been gathering and collaborating data--a process that includes, locating towers that have been planted by science teams around the world, determining which researchers monitor each tower site, and then making sense of their datasets. "It was so time consuming just to obtain and collaborate on data that many people just didn't pursue questions about our planet's breathing over large distances or decades," adds Vargas.
Realizing the scientific benefits of collaborating and harmonizing data on a global level, a team of environmental scientists and database specialists from the Lawrence Berkeley National Laboratory, University of Virginia, Microsoft Research, Max-Planck Institute of Biogeochemistry, University of California Berkeley, University of Tuscia and the Berkeley Water Center joined forces to create an online collaboration portal called Fluxdata.org.
"Fluxdata.org gives researchers access to field-data collected at hundreds of sites around the globe, with a few clicks of the mouse. This information allows them to expand the scope of their studies to spatial and temporal scales that would otherwise be impossible to do with a single measurement," says Deb Agarwal, who led the development of Fluxdata.org and heads the Berkeley Lab's Advanced Computing for Sciences Department. Agarwal and Marty Humphrey at the University of Virginia currently maintain the Fluxdata.org database and web portal, which are hosted on Berkeley Lab servers.
Fluxdata.org was originally built with data collected at the 2007 FLUXNET collaboration meeting in La Thuile, Italy. FLUXNET is essentially a global collaboration of regional research teams that measure the "carbon flux" in various ecosystems. Each regional team is comprised of individual research groups that are pursuing their own scientific interest. In recent years, the collaboration has also invited new and existing research groups to incorporate additional information into the database.
"By giving these individual groups an incentive to add their data to Fluxdata.org, we've managed to create a database that contains input from 250 plus sites, and more than 960 site-years of data and it continues to grow," says Agarwal. A site-year is the amount of data a single tower collects in a year.
By agreeing to share their information on Fluxdata.org, researchers can use field data collected at other sites in their own research, as well as track the published papers that benefit from their site's data. Because the portal requires that everybody submit their data in the same format, researchers can easily use the data to analyze fluctuations in the planet's carbon balance, regionally and globally, over decades.
According Vargas, just four years ago, it was common practice for researchers to archive site data in spreadsheets on their personal computers. To study how the Earth recycles carbon dioxide, water vapor and energy on a regional or global scale, researchers would either e-mail spreadsheets back-and-forth, or scientists would travel to different sites to manually download data from each one. However, because each site formats its data according to the scientific interest it is pursuing, the researcher would still need to manually import and reformat data into a uniform layout before any analysis could be done.
"This is why the Fluxdata.org repository is so fantastic. With instant access to hundreds of site years of information that is in the same format, one can ask questions that we wouldn't have been able to ask previously. Now it is possible to look at regional and global trends across decades, and if one has any questions about the dataset, we know exactly who to ask," says Vargas, who did his postdoctoral research at UC Berkeley.
When a site is collecting data every 30 minutes for 24 hours a day, 365 days a year, something is bound to go awry.
"Events like instrument malfunction and poor weather will alter the quality of your data. Because you can't turn back time to collect good data, many researchers have gaps in their datasets," says Dennis Baldocchi, a professor of environmental science at UC Berkeley and Vargas' former advisor. He is also the principal investigator for the FLUXNET collaboration.
Because researchers need data every half-hour to accurately track trends in an ecosystem's breathing, Baldocchi notes that researchers estimate (or gap-fill) missing data using mathematical algorithms. Although these algorithms have been around for several years, researchers typically applied them manually. However with the help of researchers from the University of Tuscia and Max-Planck Institute of Biogeochemistry, there is now a tool behind the Fluxdata.org portal that does this automatically.
"Productivity in our field has increased exponentially as a result of tools like Fluxdata.org, and the proof is in our publications. In the 1990s we published tens of papers per year as a community, now we are publishing several hundreds of papers annually," says Baldochhi. In fact, in July 2010 alone, two research teams that have reaped the benefits of Fluxdata.org were published in the journal Science, he notes.
Although Fluxdata.org is not the FLUXNET collaboration's first attempt to create a global data repository, it is one of the first to provide the community specific software tools to distribute and navigate through the database in an effective and efficient manner.
According to Agarwal, these new software tools became increasingly important as the FLUXNET collaboration grew to more than 500 sites. She notes that the group's first "Marconi Database," produced in 2000, only contained 97 site-years of data from 39 European and North American sites.
"As long as these towers are operating, they will collect information every half-hour. In addition, the number of sites that want to join FLUXNET is increasing with time, so the need for tools to organize, navigate and distribute this data is becoming increasingly important," says Agarwal.
She notes that because FLUXNET is a global network of regional networks and participating science groups receive funding from different sources, one of the greatest challenges to building collaborative tools like Fluxdata.org is getting an agency to take ownership of this work and fund it.
"This type of research is not work that requires a supercomputer or an expert with a unique set of skills. These tools are built with open source and commodity software and hardware," says Agarwal. "This work is on the border between computer science and science research, and it will be vital for accurately understanding and predicting changes in our planet's climate and weather."