Also this week: causal inference vs. Facebook, why read old papers, and more.
From Jeremy:
Recently in the comments Mark Vellend suggested that we poll readers on controversial ideas in evolution, as a complement to our poll on controversial ideas in ecology. I passed the suggestion on to Andrew Hendry, who’s a better person to do it–and he’s doing it! He’s soliciting suggestions as to what questions to poll on.
Various methods of causal inference from observational data fail to recover the results of randomized controlled trials of online advertising, even when you have as much observational data as Facebook has (note: link goes to an unreviewed preprint). The details are specific to online advertising, but I wonder how much the broader story generalizes. Is there any field in which observational methods like matching, differences-in-differences, regression discontinuity, instrumental variables, structural equation modeling, etc., have been shown in practice (not just in theory) to reliably recover causal effects estimated by randomized controlled experiments? And if you say that that’s the wrong question to ask, on the grounds that experimental estimates of causality are also problematic, what general advice would you have for investigators who are faced with a serious discrepancy between observational and experimental estimates of some causal effect?
Why read old philosophy? (ht Marginal Revolution) The same question could be asked about old ecology–would it have a different answer? I agree with Meghan that there are old ecology papers that you can usefully reread, to get new insights and to remind yourself of old insights you’d forgotten. And I think there are some old papers that it’s useful to read so that you’re better able to recognize when “new” research is truly new vs. just old wine in new bottles (though obviously there are other ways to learn that, such as by reading histories of your field). And I think that there’s an element of different strokes for different folks here. Reading old papers might work for some people but not others. Somewhat like how reading blogs outside my field helps me think about ecology, but wouldn’t necessarily help others. But I also think that old papers that are worth rereading, by anyone, for any purpose besides historical interest, are fairly rare. Possibly, they’re outnumbered by old papers that ecologists continue to read and cite out of habit rather than for some good reason.
How a shoddy, ad hoc statistical method entered the sports science mainstream. I may use this as a statistical vignette in intro biostats next term.
Theory vs. (statistically-sophisticated) empiricism in one meme. đ (ht @noahpinion)
Hey Jeremy, a couple of comments about MBI (which I had never heard of before reading about it here and then watching the Kristin Sainani video). There are clearly a bunch of problems with the claims that have been made for MBI being better than using 0.05. First, stating that MBI will lower both Type I and II errors is close to inexcusable because there is an unavoidable tradeoff between Type I and II and the proponents should have known that. The fact that this claim seems like it arose, at least in part, from not calculating Type II error rates properly, doesnât help their case. Second, they glide across some very important issues, such as, identifying âtrivialâ effect sizes. Third, the mathematical foundation and inferential limits of p-values are well established and thatâs clearly not the case for MBI. So, what comes next shouldnât be seen as a defense of MBI.
However, I want to address a couple of assumptions made by critics of MBI that I donât think should be accepted without careful thought. Two of the key âproblemsâ identified by critics of MBI are, (1) MBI generally results in higher false positive rates and (2) the standards of evidence are âloosenedâ at smaller sample sizes. First, higher false positive rates are always going to result in lower false negative rates so, whether higher probability of a false positive is a good or a bad thing depends on how much lower Type II error rates are (and your opinion about which of those two kinds of errors is worse). Any assessment of two methods that concludes one method is worse than the other based ONLY on a difference in Type I error rates, has provided ZERO evidence in favour of one method over the other unless you start from the assumption that Type I errors are worse than Type II errors. Second, there is a reasonable logical argument to be made for loosening the standards of evidence at lower sample sizes. It goes something like this,
1. Type I and II errors are equally problematic.
2. All things being equal (including maintaining the threshold for rejecting the null at 0.05), the probability of making a Type II error increases as sample sizes decrease.
3. Thus, to maintain some kind of balance in the probability of Type I and II error rates we should increase the threshold for rejecting the null as sample sizes decrease.
So, I wonât be using MBI but not because of these two âproblemsâ.
Jeff