Note from Jeremy: Behavioral ecologist Tim Parker had some very interesting comments on my recent post about replicability in ecology vs. the social sciences. So here they are as a standalone post. Tim has unusually broad and deep knowledge about replicability across many different fields. So please do read on if you’re interested in this issue!
Thank you to Jeremy for calling attention to Alvaro de Menand’s post on replication forecasts in social science. I want to say a few things about that project and a few things about replicability in ecology.
First, I urge anyone with an interest in the reliability of empirical science to read Alvaro’s post. It’s insightful and well written. I don’t agree with everything in that post, but I won’t get into a detailed critique here. I have been working on a parallel project (replicATS), funded through the same DARPA program, to generate forecasts for the same 3000 social science studies. Our approach was different – we spent more time with each paper (20-30 minutes rather than 2.5 minutes), and we refined our forecasts through a collaborative process (IDEA protocol). My experience with working on approximately 100 of these forecasts leads me to conclusions that are broadly similar to Alvaro’s. Also, as an ecologist, it was fascinating to dive into the social science literature with a critical eye, and to realize that I could actually generate replicability forecasts that often converged with those of domain experts. The lesson: understanding a few basic principles of study reliability can take you far in this process.
As for replicability in ecology, I expect that it varies considerably among sub-fields. This is because sub-fields vary based on the features known to predict replicability, including two of the best predictors of replicability – sample size and plausibility / prior probability of hypotheses. Small sample size is obviously an important cause of variability in effect size, and I doubt anyone reading this needs convincing of that, but if you want an empirical demonstration, check out Fanelli et al. 2017 PNAS. However, the relationship between prior probability and replicability may be less obvious. For instance, in a null hypothesis testing framework, the easiest way to get a high proportion of false positives (FPRP – false positive report probability) is to test unlikely hypotheses (this has been widely discussed, but a frequently cited explanation is in Ioannidis 2005 PLOS Medicine). Social psychology suffers from both small samples and an attraction to unlikely (counter intuitive, exciting) hypotheses, and this probably explains its poor replicability in the big multi-study replication projects (e.g. OSC 2015 Science).
As an aside, one of the biggest obstacles I faced when estimating replicability of social science studies came when I lacked sufficient information to generate a strong prior. I think one of the reasons we can estimate replicability reasonably well in social psychology (Dreber et al. 2015 PNAS) is that our own lives give us a robust basis for assessing the plausibility of hypotheses.
Anyway, back to ecology. As someone trained as a behavioral ecologist, I am willing to identify that sub-discipline as the one with the closest parallels to poorly replicable fields of social science – both due to small samples and frequent tests of unlikely hypotheses. Although the small sample size problem is widespread in ecology, I suspect that the unlikely hypothesis problem may be less so. I’d be curious to hear what others think. I agree, however, that other obstacles to replicability in ecology would involve biological heterogeneity (temporal / spatial / taxonomic) and difficulty with methodological heterogeneity (which is why I’m a big fan of distributed experiments like NutNet and DRAGnet). However, one of the problems that plagues some social science realms but I think may be less of a problem in some realms of ecology is the obsession with p = 0.05 as a threshold. I saw many social science papers with results hovering below 0.05, and ‘statistical significance’ is poorly repeatable when your p-value is between 0.05 and 0.01. Also, when a study reports a suite of p-values falling in that range, that in itself is an unlikely event (even in the case of a real effect), and so is a red flag for p-hacking or selective reporting. Again, I don’t think I see as much of that in ecology broadly. However, I haven’t systematically searched.
My last words for now – I suspect ecology has higher replicability on average than the social science disciplines with the poorest replicability, but I doubt we’re in wonderful shape. That said, it will be hard to generate robust estimates of replicability across the discipline because so many ecology studies would be difficult to replicate. However, various people are working around the edges of this problem and I think will generate some useful insights soon. For instance, the collaborative many-analysts project I’m working on has had 171 analysts or analyst teams submit separate analyses of one or the other of two ecology/evolutionary bio data sets, and this should help us understand the degree to which among-analyst variation in statistical decisions can drive heterogeneity in results.