Experts in the analysis of big data have noticed a curious pattern among those who tweet: Twitter accounts with the most followers are more likely to attract new ones. Its just one of the many interesting and useful nuggets revealed when researchers peer deeply into the remarkable ocean of data being stockpiled by social media channels.
Stanford researchers from diverse disciplines are developing new and better ways to find meaning in data. The promise of their work has grown so great that the Defense Advanced Research Projects Agency (DARPA) recently stepped to the plate with a grant of $5.6 million to support their research.
The new project is called MEGA: Modern Graph Analysis for Dynamic Networks, and is led by Associate Professor Ashish Goel of Stanford's Management Science and Engineering department. A team of seven principal investigators, six of them Stanford faculty, will develop algorithms which model human communication and detect subtle patterns in huge data sets from social media.
DARPA is interested because, from a national security standpoint, big data holds the promise of recognizing threats in unusual or suspicious social interactions of terrorists and other foreign adversaries. But Goel, who also holds a courtesy appointment in computer science and serves on the technical advisory board of Twitter, Inc., said that the models and algorithms MEGA develops will also influence social media itself, leading to a more sophisticated, personalized experience for all users.
HARNESSING ENORMOUS DATA SETS
Our daily social communication is spread across many forms of interaction. E-mails, tweets, text messages and Facebook posts define our modern social lives. More than ever, information about this correspondence and behavior can be collected, stored, and made available to computer scientists.
With access to billions of tweets, e-mails and text messages, a project like MEGA can build reliable mathematical models of social phenomena, like the way news spreads through a network for instance, or even how people choose their social connections, Goel said. "From an intellectual point of view, it's really exciting."
One goal of the MEGA project is to model human online behavior and find how it shapes social networks. That bit about well-followed Twitter accounts attracting the most new followers is but one example. The team can then transform these known patterns into a more general, abstract theory and see if it applies across many social networks.
The sheer number of communications and the speed at which a network changes have given rise to new challenges, said Goel, problems that more storage or more processing power cannot solve. For instance, in order to analyze the masses of data flowing out of popular social media sites like Twitter, what happened yesterday might as well have happened last century. What matters most is now. The MEGA team wants to analyze it immediately, not gather and organize it later.
"On a site like Twitter, you're not finding data that was there yesterday, you're finding data that was there last second. And even one second of this data is too big to process on a single machine," he said. To achieve real-time analysis, the data must be stored and explored across many different computers, which requires yet more new algorithms. This is a second component of MEGA's research: writing the step-by-step procedures for processing distributed data in real time.
AGE-OLD PROBLEMS AND FUTURISTIC SOLUTIONS
Goel says they have had some early successes, and the group expects to publish high-impact results in the form of new models and algorithms within the project's first or second year.
Some of their algorithms and programs will be passed directly to DARPA to be used in a security context, but the team is also tackling long-standing theoretical problems in computer science. One such problem is the notoriously difficult "travelling salesman" scenario studied by Amin Saberi and his students: if a salesman has a list of cities to visit, and he must visit each one exactly once before returning to where he started, how can we calculate the shortest possible route?
This problem may seem unrelated to the world of social media, but at heart, it deals with a network of access points - like mobile phones or computers on the Internet - combined with an algorithm for calculating the shortest path among them. Goel said it is important to keep making progress on these kinds of classical problems. Even when they don't have an immediate, real-world application, he said, they advance our understanding of computer science as a discipline.
The team also plans to explore the connection between human behavior - the things we enjoy and choose to share in our social networks, or what we're looking for when we search online - and algorithms that help shape our online experience, like friend recommendations or search engine results.
MEGA's algorithms might, for example, lead to a search engine that takes into account not only keywords a user is typing in, but also that user's social connections and what's trending online at that moment. This system would essentially construct a brand new, highly personal search engine for each and every search, he said.
A TIGHT NETWORK
Helping things along, the MEGA team enjoys close ties to networking companies including Facebook, Twitter and Cisco. This means that their work may someday be used to drive new features on popular social media sites. "It happens only occasionally that you can design an abstract system that actually affects society and the economy on such a large scale," Goel said.
The project likewise unites a diverse group of experts. Goel's expertise lies in algorithm design, and he is responsible for several of Twitter's algorithmic products. Two other Management Science and Engineering professors, Amin Saberi and Ramesh Johari, will also contribute their algorithmic and modeling knowledge. Andrea Montanari, an associate professor of electrical engineering and statistics, will be the team's statistician and information theorist, while Associate Professor of Computer Science Jure Leskovec brings expertise in data mining and modeling. Economics professor Matthew Jackson has been collecting data from villages in India, which he hopes to compare to online networks like Facebook and Twitter. Also involved in the research is John Heidemann of USC's Information Sciences Institute.
"We were all having a lot of success in our individual research," Goel said, "but the DARPA grant allows us to work together to understand how social networks operate."
Kelly Servick is a science-writing intern working for the Stanford University School of Engineering.