Can publicly available data from large-scale social media networks be used to help predict catastrophic events within the country's infrastructure, such as threats to national security, the energy system or even the economy?
Conrad Tucker, associate professor of engineering design and industrial engineering at Penn State, has received funding from the U.S. Air Force to investigate whether crowd-sourced data from social media can be used to not only detect threats, but also prevent catastrophic events from happening in the future.
Tucker received $342,995 for the three-year project titled, "Transforming Large Scale Social Media Networks into Data-Driven, Dynamic Sensing Systems for Modeling and Predicting Real World Threats."
"The challenge with using data that comes from a group is that people -- or algorithms -- can be unreliable," explains Tucker. "So the major thrust of this project is to create algorithms that increase the reliability of the information that you can acquire from these publicly available sources."
One of the obstacles researchers face in advancing machine learning -- or using computers to predict outcomes -- is the acquisition of high-quality data. It is typically costly and time consuming to obtain large data sets. However, with the emergence of the internet and social media networks, the availability of data is less of a challenge, and large sets of data from publicly sourced social networks are becoming more readily available.
"We live in an increasingly digitally connected world, and this connectivity actually presents challenges, like volatility," said Tucker. "If one CEO's tweet can send a stock's price down billions of dollars, that is a huge threat to the company and its stakeholders. That is just one example of what we are looking to model with new algorithms that can analyze and predict such chaos."
There have been other industries in which researchers have used publicly available data as a decision-making tool, such as in health care. Researchers in the field have explored the concept for disease surveillance or capturing the spread of epidemics.
"Regardless of what domain you are looking at, the fundamental problem is when you go to sample data or acquire information, how do you know which pieces of data to include in your model and how do you know which ones to leave out," said Tucker. "That's the biggest problem and the area in which this study is seeking to make one of the more significant contributions in this space."
Because of the growing prevalence of connectivity worldwide, new threats continue to emerge. Predicting those threats is also a major part of this project.
"One of the major threats to society in the 21st century is the integrity of information ... how do people decipher what's real and what's fake? How do you start preventing misinformation from being disseminated via social media?" asked Tucker. "I think that's going to be a very difficult notion to combat, especially as algorithms become better at generating human-readable text and images. I don't have the answer yet but hopefully this is a good start to finding out how we can get there."
Tucker's collaboration with the Air Force started in 2014 when he was selected to participate in the U.S. Air Force Summer Faculty Fellowship Program. He participated in the program again in 2015.
Tucker's research at Wright-Patterson Air Force Base in 2014 and 2015 led to the development of his current project.
"I was working with Dr. Ken Hopkinson (professor of computer science at the Air Force Institute of Technology) on this idea that we can use data from social media networks as a real-time sensor," said Tucker. "We looked at different ways we could use this publicly available data, such as whether it was possible to predict electricity utilization or weather patterns based on what people observed and shared on social media. This project is a natural evolution of the previous research."