I’m just back from a week in China. On the flight I read the Kindle version of Nate Silver’s The Signal and the Noise. Here’s my review.
Nate Silver, for readers who don’t know, writes the FiveThirtyEight blog, which used to be independent but is now part of the New York Times. The bread and butter of FiveThirtyEight is forecasting the outcome of US elections, based on polling data and other information. In the most recent US presidential election, FiveThirtyEight and other quantitative political blogs got a lot of press for confidently predicting a narrow victory by President Obama, when many non-quantitative political pundits were saying the election was too close to call, or even (if they were conservative) predicting a victory by the challenger, Mitt Romney. And FiveThirtyEight’s forecasting model didn’t just correctly predict the overall outcome, it correctly predicted the winner in all 50 states, and its predictions of the vote percentage for each candidate in each state were both accurate and precise.
This wasn’t Nate Silver’s first predictive success, within or outside election forecasting. As a baseball fan, I’m also familiar with his previous work for Baseball Prospectus developing PECOTA, a model for predicting the future performance of major league baseball players. PECOTA was the first system of its kind and remains very successful. Before that, he trained as an economist, spent some time working as a consultant for an accounting firm, and (in what seems to have been a very formative experience for him) spent a lot of time and made a lot of money playing online poker.
Now, he’s written a book about the general problem of predicting the future. Specifically, how we’re mostly really bad at it, with a few notable exceptions. He talks about his own experiences with predicting elections, baseball performance, and poker (playing poker well involves making predictions about what cards others might hold, based on limited information). And he also talks about the history of prediction in all sorts of other areas–the weather, hurricane tracks (two rare predictive success stories), earthquakes, computer chess, the stock market, the economy, gambling on sports, climate change, terrorist attacks, and more.
I liked the book a lot and recommend it for anyone interested in the broad problem of making predictions. And it is a broad problem; one of the real strengths of the book is how widely Silver casts his net to get insights into when predictions fail and when they succeed. He considers characteristics of the system one is trying to predict (e.g., is it chaotic). He talks about characteristics of the available data and background knowledge (e.g., is the system well-understood mechanistically, how much data do we have and on what variables). He talks about characteristics of the people trying to do the predicting (e.g., what incentives do they have to make good predictions, are they alert to common cognitive biases). He talks about what sort of predictions people are trying to make (e.g., predicting the time and location at which a particular event will occur, qualitative vs. quantitative predictions). And he talks about different techniques for generating predictions (e.g., betting markets, mechanistic models, statistical models). The book is filled with interesting nuggets I didn’t know about. It’s also very well-written and engaging. And I didn’t find any errors in discussions of subjects about which I know something (baseball, computer chess, the stock market, economics), which is reassuring.
What emerges is that there’s no universal recipe for making good predictions. Good prediction involves a lot of good judgment, by which I mean deciding how to weigh various general considerations in any particular case. A few things are always helpful, such as good mechanistic knowledge (which we have for weather, hurricanes, and poker), a large historical database of cases similar to the ones we’re trying to predict (which we have for baseball), and acknowledging all sources of uncertainty and error. And a few things are always unhelpful, most importantly our tendency to see “patterns” where there aren’t any and so overfit the data and make overconfident predictions. But in between, there are lots of things that are helpful in some circumstances but unhelpful in others. For instance, having more computing power has helped weather forecasters, who have long had exactly the right mechanistic model of the atmosphere but lacked the ability to simulate it at sufficiently fine spatial resolutions. More computational power also has helped in computer chess. But it hasn’t helped in earthquake prediction, because we lack the ability to even write down the correct mechanistic model, much less parameterize it. And having data on more predictor variables might sometimes be useful, but often it just means more noise that you have to filter out in order to find a signal, which dramatically increases the risk of overfitting. “Big Data” usually just means a bigger haystack you have to search to find the same needle. (Graduate students, heed this last lesson when designing your own projects! Don’t measure lots of variables just because you can, or because you feel like more data is always better!)
There are two aspects of the book I didn’t like. Neither is a huge deal, but both of them stuck in my craw a bit because I found them to be in such contrast to the rest of the book. I’ll talk about them both at length just because I feel like it. Don’t let the lengthiness of comments here mislead you into thinking I didn’t like the book, because like I said I actually really liked it.
One thing I didn’t like is something Larry Wasserman and Brian also picked up on: Silver’s confusion about what it means to be “Bayesian” or “frequentist”. Silver spends a whole chapter ripping frequentist statistics as having seriously set back the progress of science, when in fact the proper way to make predictions–or indeed, any inferences whatsoever–is the Bayesian Way. There are several problems with this chapter:
- Throughout the book, Silver himself adopts what’s clearly a frequentist notion of “probability” as an objective feature of the world. By “probability” Silver means how often something happens in a long series of trials. For instance, he praises weather forecasters because, in the long run, it rains X% of the time when they say there’s a X% chance of rain. What Silver means by “Bayesian” is “using Bayes’ Theorem”, but he doesn’t seem to realize that one can do that quite happily in many contexts while retaining a frequentist notion of probability. Silver’s general explication of Bayes’ Theorem and why it is useful is couched in terms of subjective or “epistemic” probability (though he doesn’t use that term), probability as measure not of the world but of our uncertainty (lack of knowledge) about the world. This is unfortunate because in practice that’s not actually the definition of probability that he himself uses. You’d think from his general explication of Bayes’ Theorem that Silver cares about subjective Bayesian credible intervals–but it’s clear from the specific case studies he discusses that he actually cares about frequentist confidence intervals. For instance, he explicates Bayes’ Theorem using a toy example of estimating whether your spouse is cheating on you, given the fact that you found a strange pair of underwear in your dresser drawer. Of course, your spouse either is or isn’t cheating on you–the true probability is either 1 or 0–but you can’t be sure which. Silver says this is a problem of epistemic uncertainty. But in practice, he treats your particular case as one of many such cases. That is, he views you and your spouse as members of a frequentist statistical population. So that one can objectively estimate quantities like the prior probability of your spouse cheating on you, at least roughly. Silver interprets this probability as the frequency with which people (presumably people sufficiently similar to you in relevant respects) are cheated on by their spouses.
- Silver makes the mistake, unfortunately all too common among certain sorts of Bayesians, of identifying “frequentist” with “mistaken or unhelpful applications of classical frequentist null hypothesis tests”. That is, he identifies all of frequentist statistics with the worst examples of it. This is unfair on multiple grounds. First of all, the purpose of null hypothesis testing often is not to aid prediction in Silver’s sense. Predicting the future is hugely important, but it’s not the only hugely important thing to do in science (you’d never know from Silver’s book that the Higgs boson was discovered and confirmed using classical frequentist statistical procedures). So Silver here is doing the equivalent of criticizing a car because it can’t fly. Second, he neglects to mention anywhere that frequentists themselves are among the strongest critics of many of the same practices he criticizes, such as failure to correct for multiple comparisons or using one’s data to suggest hypotheses that are then tested on the same data. In contrast, throughout the book we meet all sorts of bad Bayesians, such as people who are too attached to their prior beliefs and refuse to update them in light of new evidence. Whom Silver criticizes not for being bad Bayesians, but for…not being Bayesian at all. Just as no true Scotsman would ever commit a heinous crime, apparently no true Bayesian would ever engage in any of the bad applications of Bayesianism Silver ably criticizes. Not sure why Silver identifies frequentism with the worst of frequentism, and Bayesianism with the best of Bayesianism, but he does.
- Silver blames a focus on frequentist null hypothesis testing for many problems in scientific practice that just have nothing to do with that, and that would not be fixed if tomorrow everybody adopted Silver’s preferred methods. Most published research findings are false not because of null hypothesis testing, but because of publication biases, and because of all the hidden biases that Silver himself quite rightly identifies and that are not fixed just by using Bayes’ Theorem. Bayesians of any stripe are just as capable as frequentists of finding excuses to exclude inconvenient data points on spurious grounds, just as subject to bad incentives and cognitive biases, just as inclined to fit overcomplicated models with too many predictor variables, etc.
The other aspect of the book I didn’t like was the chapter on forecasting climate change. I found the chapter very confusing; I have no idea what Silver was trying to say. The whole chapter bounces randomly from one issue to the next, with later sections often appearing to contradict earlier sections. Silver waffles between expressing what he calls healthy skepticism about predictions of climate change, and strong confidence. He waffles on whether the mechanistic complexity of GCM models is a virtue or a vice. He waffles between contrasting climate scientists unfavorably with weather forecasters, and saying that climate scientists are actually doing quite well given that they’re faced with an inherently much harder prediction problem. He presents in a positive way the work of “reasonable skeptics” like Scott Armstrong–and then in other places criticizes Armstrong’s approach. In some places he presents the lack of much increase in global mean temperature over the last 10 years as a reason for healthy skepticism about predictions of global warming, and in other places he dismisses it as expected stochasticity. Silver strongly criticizes climate scientists for developing and providing consensus views, and emphasizing how many scientists agree with that consensus. Which is weird, for two reasons. One is that providing a consensus view is precisely what the IPCC is designed to provide, for rather obvious political reasons. It’s not as if the IPCC could’ve been based on prediction markets or some other non-consensus-based means of aggregating disparate views. Second, climate scientists quite often emphasize how much they all agree precisely because politically-motivated attacks on their work claim, falsely, that disagreement exists where in fact it doesn’t. How else are climate scientists supposed to respond to false claims that they disagree on some point besides saying “Um, no we don’t”? Silver also criticizes the IPCC for changing their minds from one report to the next, which is totally bizarre coming from someone as keen on formal and informal Bayesian updating as Silver is. New information came in, the IPCC updated its reports accordingly–and this is bad? And finally, Silver doesn’t have any plausible suggestions of his own as to how climate scientists could do better. At the end, he just suggests that climate scientists stick to making predictions and completely recuse themselves from the political process or with any engagement with politically-motivated attacks on their work. But tellingly, he doesn’t say anything about how scientists could actually do this. I mean, is Silver suggesting that scientists should boycott the IPCC, decline to provide policy advice when asked for it, decline to give media interviews, decline to comment when their emails are hacked, not start websites like RealClimate, or what? Bottom line, this chapter did not convince me that Silver has much to teach the IPCC about climate forecasting, the clear communication of uncertainties in climate forecasts, or the best way to make use of scientific information in political decision-making.
I think many readers of this blog will like this book. If you like FiveThirtyEight, you’ll like this book. If you’re interested in prediction, this book’s right up your alley. If you want your students to learn good judgement and a healthy attitude about their own approach to statistics and modeling, this book will probably be at least as valuable to them as any technical textbook you might recommend to them. If you’re looking for a source for a good range of examples (both positive and precautionary) to use in your stats courses, this book is a good place to turn.
one of my favorite reviews of this book yet, very similar in my opinion, though you more ably describe the faults in his reasoning about both styles of statistics. Thank you for this review
Thanks for the review! Sounds very interesting and also applicable for me as an ecologist often interested in predicting things. Just ordered it on amazon.
It is a bit surprising to me that someone who knows so much more about statistics than I do would be so wrong on what “frequentist” is and isn’t. Although, who knows maybe we as ecologists glorify one approach over the other, by cherry picking bad uses of the opposing approach and good uses of our favorite approach.
Well, as Brian, Steve Walker, and Larry Wasserman’s posts have noted, Nate Silver is far from the only person to define “Bayesian” in the way he does. And he’s far from the first person to identify “frequentist” with particular abuses of classical null hypothesis testing. So while I don’t like that bit of his book and don’t agree with it, I don’t the problem is that Silver just doesn’t know enough. I highly doubt that he’s just ignorant.
As to why Silver says what he says about frequentist stats, all I can do is speculate. I suspect what a lot of it comes down to is that he’s found applying Bayes’ Theorem, formally and informally, very useful in his own life. In particular, the right way to play a hand of Texas hold ’em poker really is to repeatedly use Bayes’ Theorem to update your probability estimates of the hands your opponents hold (ok, there’s more to it than that, but insofar as “prediction” is involved, that’s the right way to do it). The way his election models for FiveThirtyEight work is via Bayesian updating as new poll data come in. Etc. And like a lot of people, he’s impressed by the apparent match between the way Bayes’ Theorem works–you update your prior when new data come in–and a sort of vague-but-compelling idealized vision of how science ought to work–over time you learn more and more, and so get closer and closer to the truth, right? In contrast, nowhere in the book does he have any occasion to talk to people doing randomized controlled experiments (Well, he notes that Google does thousands of little experiments every year, most of which end up rejecting the ideas that inspired them. But the lesson he draws from that is “make lots of predictions; constantly put your beliefs to the test”, not “hey, there’s more to learning about how the world works than just Bayesian updating”). So I think he probably just overgeneralized from his own experience. I actually really wish he’d talked to the physicists working at the Large Hadron Collider–I wonder how it would’ve changed his book.
Or, failing that, I wish he’d have talked about the full history of Bayesian thinking, rather than skipping from the Reverend Bayes and Laplace to the present day. A theme of the book is that of intellectual progress, albeit progress that’s often uneven, delayed, or temporarily set back. He could easily have done a chapter-length potted history of different lines of Bayesian thought and spun it as a story of fitful but eventual progress. With approaches like full-on subjective Bayesianism treated as unfortunate setbacks along the way to learning how to be good Bayesians.
He also perhaps could’ve talked a bit more about where we get the mechanistic, causal knowledge that he says is often key for building good predictive models. The classic use of randomized controlled experiments is of course to test for causal connections between variables.
And in fairness, as you say I’m naturally more alert than someone like Nate Silver to good examples of the approaches I myself use, and bad examples of the approaches others use. But on the other hand, as I’ve tried to suggest above I think there are ways he could’ve tweaked the book that would not have watered down his main messages–not even his advocacy for the usefulness of Bayesian updating–while making the bits on frequentism more fair and accurate. He didn’t need to slam a caricature of frequentism to make his case.
Pingback: The downside of data sharing: more false results | Dynamic Ecology
Pingback: Bonus Friday link: Deborah Mayo on Nate Silver | Dynamic Ecology
Pingback: Friday links: good cartoons of bad arguments, and more | Dynamic Ecology
I’ve just been reading the book and didn’t know that you had done a review of it. I have really enjoyed the book and highly highly recommend it.
In contrast to Jeremy I especially enjoyed the chapter on climate prediction. I understood the points that Silver was making in that chapter, but concede that the writing isn’t always perfectly clear and the organization is a bit odd.
I think one of the main points he was trying to make is that there is healthy debate among climate scientists about the strengths and weaknesses of the climate models and their specific predictions. He is trying to communicate to the layman that climate scientists are not a bunch of sheep just reinforcing their group beliefs. They are independent actors that don’t always agree. He’s making this point not as a critique of the science, but rather in praise of the science and the process that led to the IPCC predictions.
Essentially he’s arguing that many of us confuse the term “consensus” for lockstep unanimity on all the aspects of the science. That would actually be bad and reinforce skeptics belief that their is self-censorship and conspiracy among the climate scientists.
What I liked about the chapter was that Silver does a great job highlighting a key fact about climate change that often gets overlooked. The greenhouse effect is a relatively simple and well understood process that is universally accepted. Back of the envelope calculations relating CO2 concentration to temperature do a good job predicting the basic trends in global temps.
Pingback: Stop and smell the data | Making Information Visible
Pingback: Stop and smell the data « OptimalHq
Pingback: Stats vs. scouts, polls vs. pundits, and ecology vs. natural history | Dynamic Ecology
Missed this at the time, but here’s climate scientist Michael Mann’s critique of Silver’s chapter on climate change:
Mann doesn’t think highly either of the chapter in general, or of Silver’s portrayal of their interview. He picks up on some of the same problems I picked up on, and others besides.
On the other hand, I think it’s unfortunate that Mann indulges in speculation about whether Silver’s errors trace back to his having attended the University of Chicago. Intentionally or not, his bit comes off as a strange, amateurishly transparent attempt to tar Silver. (Protip: if you find yourself criticizing a *progressive* like Silver because you think he *might* have taken a class with Milton Friedman or Steven Leavitt a couple of decades ago, you should probably start hitting the backspace key. And if you find yourself admitting the pointlessness of *your own argument* by writing “Maybe Nate doesn’t share that philosophy, but those who educated him do”, you should *definitely* start hitting the backspace key!) And in complaining as he does about how Silver portrayed their interview (leaving out or simplifying lots of points Mann made), Mann evinces no sense that Silver was writing a whole book, and that his interview with Mann was only going to be one tiny piece of one chapter.
Pingback: Friday links: animals jumping badly, the recency of publish-or-perish, mythologizing wolves, and more | Dynamic Ecology
Pingback: Friday links: valuing scientists vs. science, real stats vs. fake data, Pigliucci vs. Tyson, and more | Dynamic Ecology
Another review of Silver’s book: http://junkcharts.typepad.com/numbersruleyourworld/2012/12/book-review-the-signal-and-the-noise.html
Pingback: The meteorologist’s job | Bayesian Philosophy
Pingback: Stop and smell the data | OptimalBI
Pingback: Friday links: t-shirts vs. coronavirus, cafeteria menus vs. scholarly papers, and more | Dynamic Ecology
Pingback: Ask us anything: the future of machine learning in ecology | Dynamic Ecology