Just ran across an interesting paper from international relations (Chaudoin et al. 2016), with potential application to ecology.* It’s about the problem of “selection on unobservables”, also known as the problem of shared causes. For instance, you can’t tell that joining an international human rights treaty causes countries to respect human rights, because some possibly-unobserved causal factor that drives compliance with the treaty might also drive the initial decision to join. So that countries that join the treaty are those that would’ve respected human rights anyway. I’m sure you can think of analogous scenarios in ecology. Various methods have been proposed to deal with this and allow causal inferences from observational data (e.g., matched observations, statistical control using covariates, structural equation models, instrumental variables). But do those methods work in practice?
The linked paper takes an interesting approach to answer that question: it uses a standard causal inference method to estimate if joining the World Trade Organization or the Convention on Trade in Endangered Species has a “causal” effect on variables that nobody thinks are causally affected by international trade or trade in endangered species For instance, the paper asks if joining CITES causes a country to have a legislature. The authors find that membership in both treaties is estimated to have statistically and substantively “significant” effects on irrelevant variables an alarmingly high fraction of the time. Which suggest that standard methods of causal inference from observational data have alarmingly high false positive rates when applied to real world data (or else that researchers’ hypotheses about what causes what are completely useless).
I think it’d be very interesting to take a similar approach using ecological data and other methods of causal inference. For instance, if you fit structural equation models with some ridiculous causal structure to real ecological data, how often do you find “significant” and “strong” causal effects? And how well do the resulting SEMs fit observed ecological data, relative to the fit of SEMs based on “plausible” causal hypotheses? Has anyone ever done this in ecology? If not, it seems to me like low-hanging fruit well worth picking.**
Off the top of my head, I can think of a few ecology papers in the same spirit. For instance, Petchey et al. (2004) and Wright et al. (2006) tested whether conventional classifications of plant species into “functional groups” (e.g., C3 plants, C4 plants, forbs, etc.) are biologically meaningful. They did this by randomly reshuffling the functional groups into which real species were classified, and then checked whether the resulting ridiculous functional group classifications result in a significant relationship between functional diversity and ecosystem function. The answer is yes: randomly classifying species into biologically-meaningless functional groups often results in a “significant” relationship between functional group richness and ecosystem function, even after controlling for effects of species richness. And the relationship often is just as strong as the relationship with “real” functional groups. Which suggests that “real” functional groups aren’t so real after all. Ok, the Petchey et al./Wright et al. approach is slightly different than the one discussed above, in that it uses randomized data on possibly-relevant variables rather than non-randomized data on obviously-irrelevant variables. But the spirit is the same.
UPDATE: In the comments, Sarah Cobey reminds us that she and Ed Baskerville recently used the Chaudoin et al. approach to test a causal inference method known as convergent cross mapping. It failed badly.
I think the same approach could be much more widely used in ecology. Don’t just use causal inference on observational data to detect causes that seem like they might be real. Make sure your approach doesn’t detect causes that definitely should not be real.
*One of the best parts of being me is that I get to type weird sentences like that one.
**And if this is a really stupid idea, hopefully Jim Grace will stop by in the comments and say so. 🙂 One operational definition of “blog” is “place where you can share half-baked ideas, so that people who know better than you can tell you why they’re only half-baked.”