It's a problem all too familiar to modern researchers: They're drowning in data. Say a sociologist in South Florida wants to study how flood risk affects the area's vulnerable, low-income communities. That researcher would need to integrate reams of complicated data, from flood maps, to building stock, to income distribution by region.
That's a tall order. Such data are collected by different agencies, stored in different formats, and protected by different levels of access. Urban sustainability issues in particular crisscross mountains of data, with no easy way for researchers to use that data efficiently.
Too much data, without a means to integrate it, is the motivation behind a newly funded project led by Shrideep Pallickara, professor in the Colorado State University Department of Computer Science. Pallickara and a multidisciplinary team of university and government partners have been awarded $3 million from the National Science Foundation to develop a system for streamlining and managing vast datasets that could advance research in urban sustainability.
The five-year award is from the NSF's Cyberinfrastructure for Sustained Scientific Innovation division. It includes CSU collaborators Sangmi Pallickara and Sudipto Ghosh in the Department of Computer Science; Mazdak Arabi in the Department of Civil and Environmental Engineering; Jay Breidt in the Department of Statistics, and researchers from Arizona State University, University of California Irvine, and University of Maryland Baltimore County.
"The key here is to complement, enhance and catalyze what a researcher can do," said Shrideep Pallickara, whose expertise is in cloud infrastructure and large-scale cyberphysical systems. "We want to meet them where they are, rather than tell them to come meet us."
In other words, Pallickara and team want to help urban systems researchers do better science by clearing roadblocks caused by unwieldy datasets that don't talk to each other. Their secret sauce is a spatiotemporal "sketching" algorithm that essentially decouples these reams of data from the useful information hidden inside that data.
The methodology, which Pallickara's team has previously worked on, extracts and organizes information from raw datasets using simplified sketches of the data. These sketches are abbreviated versions of the much denser data, and are nimbler and easier to store. Think of the sketched data as a hand drawing of a famous painting, like the Mona Lisa; the data would be the original painting, while the "sketch" would preserve the essential features of the original.
Once constructed, the sketch, not the original dataset, can be consulted to address research problems. Sketches from disparate fields can be overlaid multiple times, offering researchers powerful new ways to do richer, more complex analyses.
"Science at the speed of thought, is what we want," Pallickara said.
Over the course of five years, the researchers will work with data-rich organizations and agencies, including Google Earth, Esri, NSF National Ecological Observatory Network, National Center for Atmospheric Research, U.S. Army Corps of Engineers, and many others, to test and improve the project framework. Work funded by the grant will include development of data visualizations and model assessments from many different areas of the urban sustainability landscape.
The grant also funds STEM outreach efforts targeting middle school students. Pallickara and team will host a weeklong summer educational camp for Fort Collins middle school students, offering the students foundational mathematical and statistical concepts with experiential learning. Also included in the grant-funded efforts is the development of a senior undergraduate course on spatiotemporal data analysis, which will cover creating and validating models using the sketch ecosystem.