I’m just back from a week in China. On the flight I read the Kindle version of Nate Silver’s The Signal and the Noise. Here’s my review.
Nate Silver, for readers who don’t know, writes the FiveThirtyEight blog, which used to be independent but is now part of the New York Times. The bread and butter of FiveThirtyEight is forecasting the outcome of US elections, based on polling data and other information. In the most recent US presidential election, FiveThirtyEight and other quantitative political blogs got a lot of press for confidently predicting a narrow victory by President Obama, when many non-quantitative political pundits were saying the election was too close to call, or even (if they were conservative) predicting a victory by the challenger, Mitt Romney. And FiveThirtyEight’s forecasting model didn’t just correctly predict the overall outcome, it correctly predicted the winner in all 50 states, and its predictions of the vote percentage for each candidate in each state were both accurate and precise.
This wasn’t Nate Silver’s first predictive success, within or outside election forecasting. As a baseball fan, I’m also familiar with his previous work for Baseball Prospectus developing PECOTA, a model for predicting the future performance of major league baseball players. PECOTA was the first system of its kind and remains very successful. Before that, he trained as an economist, spent some time working as a consultant for an accounting firm, and (in what seems to have been a very formative experience for him) spent a lot of time and made a lot of money playing online poker.
Now, he’s written a book about the general problem of predicting the future. Specifically, how we’re mostly really bad at it, with a few notable exceptions. He talks about his own experiences with predicting elections, baseball performance, and poker (playing poker well involves making predictions about what cards others might hold, based on limited information). And he also talks about the history of prediction in all sorts of other areas–the weather, hurricane tracks (two rare predictive success stories), earthquakes, computer chess, the stock market, the economy, gambling on sports, climate change, terrorist attacks, and more.
I liked the book a lot and recommend it for anyone interested in the broad problem of making predictions. And it is a broad problem; one of the real strengths of the book is how widely Silver casts his net to get insights into when predictions fail and when they succeed. He considers characteristics of the system one is trying to predict (e.g., is it chaotic). He talks about characteristics of the available data and background knowledge (e.g., is the system well-understood mechanistically, how much data do we have and on what variables). He talks about characteristics of the people trying to do the predicting (e.g., what incentives do they have to make good predictions, are they alert to common cognitive biases). He talks about what sort of predictions people are trying to make (e.g., predicting the time and location at which a particular event will occur, qualitative vs. quantitative predictions). And he talks about different techniques for generating predictions (e.g., betting markets, mechanistic models, statistical models). The book is filled with interesting nuggets I didn’t know about. It’s also very well-written and engaging. And I didn’t find any errors in discussions of subjects about which I know something (baseball, computer chess, the stock market, economics), which is reassuring.
What emerges is that there’s no universal recipe for making good predictions. Good prediction involves a lot of good judgment, by which I mean deciding how to weigh various general considerations in any particular case. A few things are always helpful, such as good mechanistic knowledge (which we have for weather, hurricanes, and poker), a large historical database of cases similar to the ones we’re trying to predict (which we have for baseball), and acknowledging all sources of uncertainty and error. And a few things are always unhelpful, most importantly our tendency to see “patterns” where there aren’t any and so overfit the data and make overconfident predictions. But in between, there are lots of things that are helpful in some circumstances but unhelpful in others. For instance, having more computing power has helped weather forecasters, who have long had exactly the right mechanistic model of the atmosphere but lacked the ability to simulate it at sufficiently fine spatial resolutions. More computational power also has helped in computer chess. But it hasn’t helped in earthquake prediction, because we lack the ability to even write down the correct mechanistic model, much less parameterize it. And having data on more predictor variables might sometimes be useful, but often it just means more noise that you have to filter out in order to find a signal, which dramatically increases the risk of overfitting. “Big Data” usually just means a bigger haystack you have to search to find the same needle. (Graduate students, heed this last lesson when designing your own projects! Don’t measure lots of variables just because you can, or because you feel like more data is always better!)
There are two aspects of the book I didn’t like. Neither is a huge deal, but both of them stuck in my craw a bit because I found them to be in such contrast to the rest of the book. I’ll talk about them both at length just because I feel like it. Don’t let the lengthiness of comments here mislead you into thinking I didn’t like the book, because like I said I actually really liked it.
One thing I didn’t like is something Larry Wasserman and Brian also picked up on: Silver’s confusion about what it means to be “Bayesian” or “frequentist”. Silver spends a whole chapter ripping frequentist statistics as having seriously set back the progress of science, when in fact the proper way to make predictions–or indeed, any inferences whatsoever–is the Bayesian Way. There are several problems with this chapter:
- Throughout the book, Silver himself adopts what’s clearly a frequentist notion of “probability” as an objective feature of the world. By “probability” Silver means how often something happens in a long series of trials. For instance, he praises weather forecasters because, in the long run, it rains X% of the time when they say there’s a X% chance of rain. What Silver means by “Bayesian” is “using Bayes’ Theorem”, but he doesn’t seem to realize that one can do that quite happily in many contexts while retaining a frequentist notion of probability. Silver’s general explication of Bayes’ Theorem and why it is useful is couched in terms of subjective or “epistemic” probability (though he doesn’t use that term), probability as measure not of the world but of our uncertainty (lack of knowledge) about the world. This is unfortunate because in practice that’s not actually the definition of probability that he himself uses. You’d think from his general explication of Bayes’ Theorem that Silver cares about subjective Bayesian credible intervals–but it’s clear from the specific case studies he discusses that he actually cares about frequentist confidence intervals. For instance, he explicates Bayes’ Theorem using a toy example of estimating whether your spouse is cheating on you, given the fact that you found a strange pair of underwear in your dresser drawer. Of course, your spouse either is or isn’t cheating on you–the true probability is either 1 or 0–but you can’t be sure which. Silver says this is a problem of epistemic uncertainty. But in practice, he treats your particular case as one of many such cases. That is, he views you and your spouse as members of a frequentist statistical population. So that one can objectively estimate quantities like the prior probability of your spouse cheating on you, at least roughly. Silver interprets this probability as the frequency with which people (presumably people sufficiently similar to you in relevant respects) are cheated on by their spouses.
- Silver makes the mistake, unfortunately all too common among certain sorts of Bayesians, of identifying “frequentist” with “mistaken or unhelpful applications of classical frequentist null hypothesis tests”. That is, he identifies all of frequentist statistics with the worst examples of it. This is unfair on multiple grounds. First of all, the purpose of null hypothesis testing often is not to aid prediction in Silver’s sense. Predicting the future is hugely important, but it’s not the only hugely important thing to do in science (you’d never know from Silver’s book that the Higgs boson was discovered and confirmed using classical frequentist statistical procedures). So Silver here is doing the equivalent of criticizing a car because it can’t fly. Second, he neglects to mention anywhere that frequentists themselves are among the strongest critics of many of the same practices he criticizes, such as failure to correct for multiple comparisons or using one’s data to suggest hypotheses that are then tested on the same data. In contrast, throughout the book we meet all sorts of bad Bayesians, such as people who are too attached to their prior beliefs and refuse to update them in light of new evidence. Whom Silver criticizes not for being bad Bayesians, but for…not being Bayesian at all. Just as no true Scotsman would ever commit a heinous crime, apparently no true Bayesian would ever engage in any of the bad applications of Bayesianism Silver ably criticizes. Not sure why Silver identifies frequentism with the worst of frequentism, and Bayesianism with the best of Bayesianism, but he does.
- Silver blames a focus on frequentist null hypothesis testing for many problems in scientific practice that just have nothing to do with that, and that would not be fixed if tomorrow everybody adopted Silver’s preferred methods. Most published research findings are false not because of null hypothesis testing, but because of publication biases, and because of all the hidden biases that Silver himself quite rightly identifies and that are not fixed just by using Bayes’ Theorem. Bayesians of any stripe are just as capable as frequentists of finding excuses to exclude inconvenient data points on spurious grounds, just as subject to bad incentives and cognitive biases, just as inclined to fit overcomplicated models with too many predictor variables, etc.
The other aspect of the book I didn’t like was the chapter on forecasting climate change. I found the chapter very confusing; I have no idea what Silver was trying to say. The whole chapter bounces randomly from one issue to the next, with later sections often appearing to contradict earlier sections. Silver waffles between expressing what he calls healthy skepticism about predictions of climate change, and strong confidence. He waffles on whether the mechanistic complexity of GCM models is a virtue or a vice. He waffles between contrasting climate scientists unfavorably with weather forecasters, and saying that climate scientists are actually doing quite well given that they’re faced with an inherently much harder prediction problem. He presents in a positive way the work of “reasonable skeptics” like Scott Armstrong–and then in other places criticizes Armstrong’s approach. In some places he presents the lack of much increase in global mean temperature over the last 10 years as a reason for healthy skepticism about predictions of global warming, and in other places he dismisses it as expected stochasticity. Silver strongly criticizes climate scientists for developing and providing consensus views, and emphasizing how many scientists agree with that consensus. Which is weird, for two reasons. One is that providing a consensus view is precisely what the IPCC is designed to provide, for rather obvious political reasons. It’s not as if the IPCC could’ve been based on prediction markets or some other non-consensus-based means of aggregating disparate views. Second, climate scientists quite often emphasize how much they all agree precisely because politically-motivated attacks on their work claim, falsely, that disagreement exists where in fact it doesn’t. How else are climate scientists supposed to respond to false claims that they disagree on some point besides saying “Um, no we don’t”? Silver also criticizes the IPCC for changing their minds from one report to the next, which is totally bizarre coming from someone as keen on formal and informal Bayesian updating as Silver is. New information came in, the IPCC updated its reports accordingly–and this is bad? And finally, Silver doesn’t have any plausible suggestions of his own as to how climate scientists could do better. At the end, he just suggests that climate scientists stick to making predictions and completely recuse themselves from the political process or with any engagement with politically-motivated attacks on their work. But tellingly, he doesn’t say anything about how scientists could actually do this. I mean, is Silver suggesting that scientists should boycott the IPCC, decline to provide policy advice when asked for it, decline to give media interviews, decline to comment when their emails are hacked, not start websites like RealClimate, or what? Bottom line, this chapter did not convince me that Silver has much to teach the IPCC about climate forecasting, the clear communication of uncertainties in climate forecasts, or the best way to make use of scientific information in political decision-making.
I think many readers of this blog will like this book. If you like FiveThirtyEight, you’ll like this book. If you’re interested in prediction, this book’s right up your alley. If you want your students to learn good judgement and a healthy attitude about their own approach to statistics and modeling, this book will probably be at least as valuable to them as any technical textbook you might recommend to them. If you’re looking for a source for a good range of examples (both positive and precautionary) to use in your stats courses, this book is a good place to turn.