The worst forecasting failures and what we can learn from them

One standard line about the importance of prediction and forecasting in science is that you can learn from failed predictions. Figure out why the prediction failed, and you can make better predictions in future (or else explain why better predictions are impossible).

There certainly are examples of that process working well–think of weather forecasting or hurricane track forecasting. But then consider this case, in which professional forecasters–people with money on the line–have consistently gotten their predictions of future 10-year Treasury bill yields wrong in the same direction, for twenty years. Highly trained professionals who will make a lot of money if they predict future T-bill yields just keep predicting that yields are going to spike in the near future, and they just keep being wrong. They’re liked stopped clocks that aren’t even right twice a day.

As another example, consider how everyone who has tried to forecast future costs of solar power has consistently missed high, year after year after year. “Everyone” includes individuals and groups openly advocating for solar power, opponents of solar power, and neutral government agencies.

Speaking as a scientist who knows a bit about forecasting, these seem to me like some of the worst forecasting failures in history. What can we learn from them? What’s going on in such cases? Apparently it’s not “political biases” or “lack of strong incentives to make correct forecasts”. Nor can it be “some time series are intrinsically hard to predict”. This isn’t earthquake forecasting. It’s not as if we’re trying to predict rare events here. And it’s not that the linked time series are too short, or chaotic, or just comprised of a random sequence of independent observations, or contaminated with massive sampling or measurement error. Heck, solar power costs are pretty well fit by a decreasing exponential function! From a permutation entropy perspective, they’re about as intrinsically predictable as a time series can possibly be! My undergrad biostats students could’ve just naively extrapolated a simple exponential function fit to the data, and beaten every professional power industry analyst! And it can’t be that “things changed”, because the whole reason these forecasts were consistently wrong in the same direction is that things didn’t change. People kept predicting they would change–T-bill yields would quit declining and spike, solar power costs would quit dropping so fast. And things kept not changing, over and over and over again. Apparently, forecasting isn’t hard only when naive trend extrapolation fails. It’s also hard when naive trend extrapolation works! Finally, it can’t be “reluctance to extrapolate short-term trends”, because (i) these are long-term trends, and (ii) people usually are happy to extrapolate from short-term trends. Indeed, forgetting about mean reversion, and so being too quick to extrapolate from short-term trends, is one of the most common forecasting errors there is! So if people are too quick to extrapolate trends in so many other contexts, how come they’re too slow to extrapolate in the very contexts where extrapolation would be a good idea (or at least, a clear improvement on whatever other forecasting method people are using)?

So what is going on here, if it’s not any of the usual sources of forecasting errors? Here’s a tentative hypothesis: nobody will ever adopt naive trend extrapolation as a forecasting method, because they know it has to fail at some point in the future. Anything that can’t go on forever will stop, as Herbert Simon said. The trouble is, knowing that an observed trend has to stop or reverse at some undetermined point in the future doesn’t mean it can’t continue just fine for an awfully long while, and doesn’t let you put a confidence interval on how long the trend will continue.* But if you have no idea when the trend will stop, you just stick with your model that keeps incorrectly predicting that the trend will stop soon. Because you know that at some point your model will be right (and you hope it’s soon). Whereas if you junk your model and just switch to naive trend extrapolation, you know that at some point you’ll be wrong (and you worry that it’s soon). Maybe people prefer knowing that they’ll eventually be right to knowing that they’ll eventually be wrong.

Another hypothesis, not mutually exclusive with the first: naive trend extrapolation doesn’t feel like forecasting at all, it feels like an expression of ignorance. “I have no idea why this trend is happening, so I predict it’ll continue.” Maybe nobody wants to confess ignorance. Even if your ignorant guess turns out to be right repeatedly, well, that maybe feels like you just made a series of lucky guesses.

Here’s a more depressing hypothesis, that I’m not sure is really a “hypothesis” so much as a just giving up on answering the question: people are just perverse. When attempting to forecast the future in contexts in which they should believe in mean reversion, they insist on spotting long-term trends that aren’t really there. When attempting to forecast the future in contexts in which they should believe in long-term trends, they insist on seeing signs of imminent mean reversion that aren’t really there. The exercise of coming up with an evolutionary just-so story as to why such perversity would’ve been adaptive for ancestral hominids in East Africa is left to commenters. 😉

A final hypothesis is that there are no generalizable lessons here. That each case of repeated forecasting errors in the same direction happens for its own unique, idiosyncratic reasons.

Looking forward to your comments.

*I’m reminded of repeated failed predictions of near-term societal collapse due to overpopulation. Yes, it’s true that human population growth can’t continue forever–but that fact is totally useless for forecasting human population growth on any time scale relevant to human decision-making.

17 thoughts on “The worst forecasting failures and what we can learn from them

    • I wouldn’t consider that as bad as the forecasting failures in the post, because it’s a one-off failure. Not a repeated failure in the same direction. Plus, as you say, the prediction worked pretty well in the short term. Whereas the forecasts in the post were all revealed as inaccurate pretty much right away.

      Now, you could argue that the forecast was bad not just because it was very inaccurate, but because there were many good reasons to think the logistic equation was a bad model of the US pop. But on the other hand, if your forecast of T-bill yields is wrong in the same direction 20 times in a row, clearly your model of T-bill yields is a bad model too!

  1. Do you think we should distinguish between predictions that actually affect the future, and those that don’t? For example, economic forecasts can affect market behaviour making them more or less likely to be accurate, but future earthquakes will always be indifferent to current predictions.

    In the latter case, prediction failure probably has some technical cause, like poor model formulation or incomplete data. In the former case, I suspect prediction failure is due to the futility of trying to predict unpredictable systems; scenario analysis is probably more appropriate for these cases (e.g. the assumptions of the scenario are as important as the model prediction).

    • Predictions that affect future human behavior certainly are much harder to get right than predictions that don’t. Though in the cases discussed in the post, it’s not clear to me that forecasts of T-bill yields or solar power costs had much effect on the variables being forecast.

  2. Nice examples Jeremy. Like you say, it definitely raises questions when the forecasts are off repeatedly, and are off by predicting outliers rather than by failing to anticipate some large deviation.

    From my quick read, I think it’s hard to dismiss the incentives question so easily here, or at least it’s no so simple as saying “a more accurate forecast would be more profitable” or “groups both supporting and opposing an expansion of solar are off in the same direction.” First, in all cases, it sounds like the forecasters in question care about the consequences of the forecast, so they have a utility at stake other than forecast skill. No forecast is gonna be perfect, and there’s no reason to believe that costs of being wrong are going to be perfectly symmetric. If the cost to predicting over the mark by delta is 10 times as much as the cost to predicting under the mark be delta, you have an incentive to over-predict even though you’d benefit even more if your prediction was perfect. Uncertainty matters.

    Another hypothesis, at least for the solar case, seems to be data bias ( The author notes:

    > But by far the most substantial error was this: When comparing costs to cumulative amount of solar capacity deployed, I used only the solar capacity installed inside the United States. I believed, since I was using prices of US projects, that comparing US prices to US cumulative capacity was the appropriate comparison of apples to apples. That was incorrect. Solar is a global industry.

    and re-running the model notes:

    > … using the model to project forward from 2015 to 2020 prices, changing to global cumulative scale reduces the error in my 2015 forecast by a factor of 3

    • Good catch on the solar case, I’d overlooked that in my skim.

      Good point re: asymmetrical costs of forecasting errors. I have no idea if there are asymmetrical costs in either of these cases.

      There’s also the possibility of herding. Better to be wrong in the same way as everyone else, than to take even a slight risk of being wrong when everyone else is right? You see this in US political polling in the days immediately before a federal election.

  3. Have you ever tried to fit the logistic differential equation, dx/dt = rx(1-x/k), to time-series data, x1, x2, …,xn, that was actually generated from the logistic model, with a bit of observation error. If xn < k/3, there is wild uncertainty in k. You basically can't even predict the order of magnitude even close to correctly. Even as xn gets past k/2, still the uncertainty in k is really high. The curve needs to really start leveling off before you can predict k with any reasonable level of accuracy. And this is given a perfectly specified model!

    Also, this seems a bit related to your post of forcing a zero intercept because you know it has to be zero. The forecasters know the quantity has to level off, so they use models where this happens.

    Edit: I didn't see ric's post before this, but also a possible explanation for the study by Pearl & Reed, but haven't read that paper.

    • Yes, when your data only consist of observations well below K, you can’t estimate K with any precision. The logistic looks more or less exponential when N is well below K. I teach this to my undergrad population ecology students.

      This spring we saw some dismaying illustrations of the same problem making the news. People fitting models of the Covid-19 outbreak to case data that were growing exponentially, and then extrapolating the models to project the future course of the outbreak. But just about anything can fit an exponential, and so models that predicted very different futures for the outbreak fit the data equally well.

      • Maybe I found this more surprising than you did. I thought at k/2 or .6*k you’d be able to estimate k a lot better than one can do when they actually engage in the fitting.

  4. Here’s a thought: maybe forecasting errors like the ones described in the post–repeatedly predicting an imminent break in the long-term trend–actually *are* an example of forgetting about regression to the mean? Think of the mean to which things keep (strongly) regressing as undergoing a long-term increase or decrease, rather than being an unchanging constant. Then if people keep seizing on short-term deviations from the long-term trend and extrapolating them, they’ll keep getting things wrong as the variable returns to its long-term increasing or decreasing trend line.

    I don’t think this can be the full story for either of the cases described in the post, since the mistaken forecasts are always mistaken in the same direction. So they’re not solely a matter of people seizing on random deviations from the long-term trend and extrapolating them. But maybe this is part of the story?

  5. Jeremy, I also worry that resistance to “naive trend extrapolation” could hold back forecasting in ecology too. We love our processes, and we love the idea that increased process-level understanding will improve prediction. But as you point out, at some scales that understanding may not help at all. We know from lots of other disciplines that beating simple statistical extrapolations can be hard, especially at short forecast time scales. I think the key for us it to be pragmatic–try different models, compare them, leave your philosophical preferences at the door, and use what works. This is also the idea behind the just-launched EFI NEON Forecast Challenge:
    A key feature of the challenge is that a naive null model (like persistence) is built into the competition, the challenge is to beat that null model. Lots of great learning opportunities here.

  6. Another example of a forecaster consistently missing in the same direction over and over again: the US Congressional Budget Office forecasting the interest rate on US government debt (10 year Treasury bonds):

    Interesting that the forecasts used to be good, or at least the errors were unbiased, back during the presidencies of Reagan and Bush I. They started consistently missing high in the early ’90s, at the start of the Clinton presidency and the early ’90s recession.

  7. Pingback: Forecasting opportunities, traps and points-to-ponder

Leave a Comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.