Everybody knows that a correlation between two variables doesn’t imply a causal connection between them. But it’s often pretty tempting to think so, especially if the correlation is a really strong one. I mean, there must be some reason for the correlation, right?* Conversely, the complete absence of a correlation between two variables doesn’t imply the absence of a causal connection between them. But again, it sure is suggestive, isn’t it? I mean, if there was any sort of causal connection between those two variables that was strong enough to be worth worrying about, you’d probably see some sort of correlation, right?
For instance, let’s say you’re studying changes in the abundance of some species over time. And let’s say that abundance just bounces around more or less randomly. There are no cycles or any other obvious temporal pattern, and no long-term increasing or declining trend. And let’s say that you find a really strong correlation between abundance fluctuations and some weather variable–maybe abundance goes up in wet years and down in dry ones or something. Whereas past densities don’t really explain these fluctuations. So while correlation doesn’t imply causation, it sure looks like weather is what matters for population dynamics, right? Further, the data strongly suggest that density dependence is at most weak and quite possibly absent entirely, right? Because after all, if population density really mattered much then surely environmental factors wouldn’t explain so much of the variation in the data, and past densities would explain a lot, right?
Wrong. Indeed, not just wrong but maximally wrong. The very opposite of true. In a world with negative density dependence (aka “negative feedback”, “stabilizing forces”, “return tendency”, and other terms), correlation is not just an unreliable or imperfect guide to causation but a positively misleading guide. The data described above suggest a world in which density dependence is strong, not weak (Don’t believe me? Read Ziebarth et al. 2010)
In a world with negative feedback, correlations among variables are not a reliable guide to causation. Indeed, they’re often positively misleading. The best way to explain why is with an analogy from economics, where the same error often gets made. The analogy is known as “Milton Friedman’s thermostat”, after the economist who suggested it in a famous paper. Here’s a very clear version of the analogy, from an old post by economist Nick Rowe:
If a house has a good thermostat, we should observe a strong negative correlation between the amount of oil burned in the furnace (M), and the outside temperature (V). But we should observe no correlation between the amount of oil burned in the furnace (M) and the inside temperature (P). And we should observe no correlation between the outside temperature (V) and the inside temperature (P). An econometrician, observing the data, concludes that the amount of oil burned had no effect on the inside temperature. Neither did the outside temperature. The only effect of burning oil seemed to be that it reduced the outside temperature. An increase in M will cause a decline in V, and have no effect on P. A second econometrician, observing the same data, concludes that causality runs in the opposite direction. The only effect of an increase in outside temperature is to reduce the amount of oil burned. An increase in V will cause a decline in M, and have no effect on P. But both agree that M and V are irrelevant for P. They switch off the furnace, and stop wasting their money on oil.
The mistake I’m criticizing here–misinterpreting correlations, and lack of correlations, among variables as evidence for or against causality when those variables are affected by density-dependent feedbacks–isn’t at all hypothetical, nor is it restricted to economics. Indeed, it’s a classic mistake in population ecology. Andrewartha and Birch reasoned more or less as in my hypothetical example above in their famous 1954 paper on thrips, work that provided much of the motivation for the entire “density independent” school of thought in population ecology. Ziebarth et al. 2010 provide a really clear formal explanation of this mistake, and a modern test for density dependence that avoids it.
Note as well that you cannot avoid this mistake simply by looking at partial correlations among variables rather than raw correlations. Multiple regression, even sophisticated extensions of it like structural equation modeling, doesn’t make this problem vanish. Nick Rowe has a very clear explanation as to why in another old post.
Nor does the mistake here depend on the “thermostat” being perfect, or on the absence of stochasticity. Again, see that second old post of Nick’s for explanation.
Now this is the part of the post where I should list some recent examples of this mistake in ecology, thereby demonstrating that it’s a zombie idea (a longstanding, widespread mistake that should be dead, but isn’t). And I am indeed suspicious that this zombie is widespread outside of population ecology (I think it’s mostly been killed off within population ecology). For instance, I suspect that attempts like that of Cottenie (2005) to use variance partitioning to determine the causes of metacommunity structure are making this mistake.** Given that there’s intra- and interspecific density dependence within sites, I doubt that you can reliably infer the causes of metacommunity structure just by looking for statistical associations between environmental and geographic variables, and species abundances. Basically, anytime someone is just using correlations, partial correlations, variance partitioning, multiple regression, structural equation modeling, or related statistical methods to try to infer how causality works in a density-dependent dynamical system, there’s a decent chance that they’re forgetting about “Milton Friedman’s thermostat” and falling prey to basically the same zombie idea that once victimized Andrewartha and Birch. But just off the top of my head, I’m not thinking of any demonstrations from outside of population ecology of people making precisely this mistake. Which perhaps just means that there are some unrecognized examples of this zombies out there, waiting to be slain. Maybe commenters can identify some examples.
There are of course approaches for inferring causality from observational data on dynamical systems that work reasonably reliably under appropriate circumstances. But the ones with which I’m familiar mostly take as their starting point the fact that you’re dealing with a dynamical system that may well have density dependent dynamics.
*Actually, no, but that’s a topic for another post.
**Approaches like that of Cottenie (2005) also have other serious problems (Gilbert and Bennett 2010).