NEW BRUNSWICK, N.J. - The National Science Foundation (NSF) has awarded two Rutgers researchers nearly $1.4 million as part of an initiative to extract useful information from so-called "big data" - massive collections of data from sources such as scientific instruments, digital images, social media streams and business transactions.
While it's easy to dismiss big data as the latest business and scientific buzzword, experts believe that new ways of probing enormous quantities of data will reveal previously unseen trends. For example, utilities could analyze billions of electric meter readings to precisely predict power consumption. Stock markets could monitor millions of trades as they happen to uncover new types of fraud. And business could refine products and practices by studying sentiments in countless numbers of tweets and blog posts. Current technologies often come up short in performing these types of fast and massive analyses.
The two Rutgers grants are part of NSF's $15 million worth of funding announced today as part of the agency's big data research initiative launched earlier in the year. Both are for collaborative research with other universities.
One project - to develop technology that maintains indexes 200 times faster in databases with billions of entries - is a joint effort with Stony Brook University. Rutgers will receive $400,000 of the $1.2 million awarded to the collaboration.
The second project, to improve the accuracy and relevance of complex scientific literature searches, is a joint effort with Cornell and Princeton universities. Rutgers will receive almost $1 million of the $3 million awarded to the collaboration.
The projects involving Rutgers are among eight that received funding today from the NSF and the National Institutes of Health, six months after the agencies announced their big data initiative at an event led by the White House Office of Science and Technology Policy.
The Rutgers investigator on the project to speed information retrieval from large databases is Martin Farach-Colton, professor in the Department of Computer Science in the School of Arts and Sciences.
"Big databases have trouble finding things if indexes aren't maintained," said Farach-Colton. "Traditional methods of indexing databases, based on methods developed 40 years ago, start to fail when databases get large."
He likens the older methods to sending a delivery truck halfway across the country to pick up a single package each time a customer places an order. The researchers are studying new ways to move index items more efficiently, much like filling delivery trucks with packages and sorting those packages at regional distribution centers on the way to their destinations.
The Rutgers investigator on the scientific literature search project is Paul Kantor, a professor in the Department of Library and Information Sciences in the School of Communication and Information.
"Scientists need to do the equivalent of Google searches on their literature," said Kantor, "but keywords alone are not sufficient to perform searches quickly and efficiently." The researchers are investigating methods to search on topics and concepts that would include collaborative information from other searchers to better determine the relevance of the results. These methods could further peg the value of a newly retrieved document by relating it to other documents that the searcher identifies as worthwhile. The researchers have been working with the "arXiv" online public archive of scientific papers run by Cornell University.
The NSF describes the eight projects it is funding as having applications in disciplines such as physics, psychology, economics and medicine.
Farnam Jahanian, assistant director for NSF's Directorate for Computer and Information Science and Engineering, says that data represents a transformative new currency for science, engineering, and education. "By advancing the techniques and technologies for data management and knowledge extraction," he said, "these new research awards help to realize the enormous opportunity to capitalize on the transformative potential of data."