Humans have always had a burning desire to know what the future holds, and science brings us ever closer to more accurate predictions in many spaces - especially with the rise of machine learning. In this special issue of Science, Prediction, researchers across disciplines delve into the advances and challenges in forecasting important outcomes related to issues such as policy, political violence and human behavior. In a Report in the issue - which is particularly timely amid a backdrop of criticism of election prediction methods following failure to predict Donald Trump's electoral college victory - Ryan Kennedy et al. unveil a modeling technique that can predict election outcomes with up to 90% accuracy. The results show that polling data is highly predictive, but that accuracy can be further improved upon by correcting biases. To gain more insights into factors predictive of presidential election outcomes, Kennedy et al. compiled data from 621 elections across 86 countries, between 1945 and 2012. They found a number of interesting correlations; for example, a strong negative relationship between the openness of the political regime and the probability of the incumbent party remaining in office. As well, good relations with the U.S. increased the probability of the incumbent party holding office, the authors report. Their analysis reveals that, even with spotty polling data, in which elections had fewer than five publicly available polls, polling data still had a strong ability to predict global presidential elections compared to other variables. Yet, the researchers were able to further improve predictive power by aggregating poll data and creating a "smoothed" polling estimate for public opinion, rather than relying solely on the polls. Using the rough polling data, they were able to predict outcomes with 80% accuracy, which was further increased to 90% when a "smoothed" polling estimate was used. Surprisingly, economic indicators were only weakly predictive. A story by John Bohannon, from the Science news department, discusses this study in further detail.
In the first of several Essays in this issue, Lars-Erik Cederman and Nils B. Weidmann discuss the challenges of predicting political violence, which is largely dependent on context, making it difficult to compare one outbreak to another. Even in the same region, for example, context may be different across time. While earlier attempts to predict violence were unsuccessful, more recent models that rely on machine learning techniques such as neural networks have demonstrated more accurate forecasts. Predicting rare violent events is particularly tricky because models usually give equal balance to probabilities of peace and violence, yet many regions are peaceful most of the time. This problem can be addressed by different resampling techniques, the authors say, which result in a much higher overall predictive accuracy of the model. Despite some advancements in predictive models of violence, however, such tools may be best for generating possible scenarios, rather than as a producer of specific policy advice, the authors conclude.
In a second Essay, Aaron Clauset and colleagues evaluate ways to predict when and by whom big scientific discoveries will be made, which could help inform how publishers and funding agencies evaluate manuscripts and project proposals. The authors highlight four key areas that could help predict discovery, which include: citations of past discoveries, who gets hired into career researcher positions, and both the scientific productivity and the timing of major discoveries over a career. After discussing each of these factors in more depth, they caution against relying too much on predictive models of discovery, which could inadvertently discourage innovation and exacerbate existing inequalities in the scientific system.
A third Essay by Philip E. Tetlock and colleagues delves into the delicate world of political debate, which often hinges on competing claims about the probability of predicted consequences. In 2011, the U.S. Intelligence Advanced Research Projects Activity held a 4-year forecasting tournament to discover the factors that yield the most accurate estimates of probability of in a statement, discovering that aggregated polls are most effective at drawing wisdom from crowds. The best forecasters of predictive statements were found to score "above average" on measures of open-mindedness. Accounting for more details on a microscale resulted in more accurate predictions on the macro-level, results showed. Tournaments also seem to encourage people to view issues from multiple perspectives, the authors note, which could help to pry open otherwise closed minds and depolarize unnecessarily polarized debates.
In yet another Essay, Susan Athey discusses the challenges of using predictive machine learning approaches to inform policy decisions. Supervised machine learning (SML) takes input training datasets and estimates or "learns" parameters that can be used to make predictions on new data. Yet Athey notes that "off the shelf" SMLs may not capture underlying assumptions or unstable factors accurately. In the example of using SML to help city governments decide how to allocate safety inspectors, simply knowing which establishments are more likely to have violations may not be enough. Other establishments, though with lower predicted risk, may be easy and inexpensive to substantially improve. Athey notes that pinpointing the causal effects of a certain policy is critical. Consistent and efficient estimation of causal effects can be achieved by modifying SML techniques, she concludes.
An Essay by Jake M. Hofman et al. looks at how the social sciences have often focused on causal mechanisms, while ignoring their predictive accuracy. In a previous study, the authors demonstrated how Twitter data can be manipulated to arrive at qualitatively different answers to the same question. They provide several explanations for this phenomenon and outline context-specific steps that researchers can take to ensure that their results are more predictive. In their recommendations, the authors encourage scientists to be clear about their processes in an open access framework.
Lastly, an Essay by V. S. Subrahmanian and Srijan Kumar summarizes four key challenges in better predicting human activity, which include: background noise in large datasets, prediction of rare events, capturing newly emerging phenomenon, and accounting for dynamic factors.