Hoisted from the comments: Tim Parker on replicability in ecology vs. the social sciences

Note from Jeremy: Behavioral ecologist Tim Parker had some very interesting comments on my recent post about replicability in ecology vs. the social sciences. So here they are as a standalone post. Tim has unusually broad and deep knowledge about replicability across many different fields. So please do read on if you’re interested in this issue!

***********

Thank you to Jeremy for calling attention to Alvaro de Menand’s post on replication forecasts in social science. I want to say a few things about that project and a few things about replicability in ecology.

First, I urge anyone with an interest in the reliability of empirical science to read Alvaro’s post. It’s insightful and well written. I don’t agree with everything in that post, but I won’t get into a detailed critique here. I have been working on a parallel project (replicATS), funded through the same DARPA program, to generate forecasts for the same 3000 social science studies. Our approach was different – we spent more time with each paper (20-30 minutes rather than 2.5 minutes), and we refined our forecasts through a collaborative process (IDEA protocol). My experience with working on approximately 100 of these forecasts leads me to conclusions that are broadly similar to Alvaro’s. Also, as an ecologist, it was fascinating to dive into the social science literature with a critical eye, and to realize that I could actually generate replicability forecasts that often converged with those of domain experts. The lesson: understanding a few basic principles of study reliability can take you far in this process.

As for replicability in ecology, I expect that it varies considerably among sub-fields. This is because sub-fields vary based on the features known to predict replicability, including two of the best predictors of replicability – sample size and plausibility / prior probability of hypotheses. Small sample size is obviously an important cause of variability in effect size, and I doubt anyone reading this needs convincing of that, but if you want an empirical demonstration, check out Fanelli et al. 2017 PNAS. However, the relationship between prior probability and replicability may be less obvious. For instance, in a null hypothesis testing framework, the easiest way to get a high proportion of false positives (FPRP – false positive report probability) is to test unlikely hypotheses (this has been widely discussed, but a frequently cited explanation is in Ioannidis 2005 PLOS Medicine). Social psychology suffers from both small samples and an attraction to unlikely (counter intuitive, exciting) hypotheses, and this probably explains its poor replicability in the big multi-study replication projects (e.g. OSC 2015 Science).

As an aside, one of the biggest obstacles I faced when estimating replicability of social science studies came when I lacked sufficient information to generate a strong prior. I think one of the reasons we can estimate replicability reasonably well in social psychology (Dreber et al. 2015 PNAS) is that our own lives give us a robust basis for assessing the plausibility of hypotheses.

Anyway, back to ecology. As someone trained as a behavioral ecologist, I am willing to identify that sub-discipline as the one with the closest parallels to poorly replicable fields of social science – both due to small samples and frequent tests of unlikely hypotheses. Although the small sample size problem is widespread in ecology, I suspect that the unlikely hypothesis problem may be less so. I’d be curious to hear what others think. I agree, however, that other obstacles to replicability in ecology would involve biological heterogeneity (temporal / spatial / taxonomic) and difficulty with methodological heterogeneity (which is why I’m a big fan of distributed experiments like NutNet and DRAGnet). However, one of the problems that plagues some social science realms but I think may be less of a problem in some realms of ecology is the obsession with p = 0.05 as a threshold. I saw many social science papers with results hovering below 0.05, and ‘statistical significance’ is poorly repeatable when your p-value is between 0.05 and 0.01. Also, when a study reports a suite of p-values falling in that range, that in itself is an unlikely event (even in the case of a real effect), and so is a red flag for p-hacking or selective reporting. Again, I don’t think I see as much of that in ecology broadly. However, I haven’t systematically searched.

My last words for now – I suspect ecology has higher replicability on average than the social science disciplines with the poorest replicability, but I doubt we’re in wonderful shape. That said, it will be hard to generate robust estimates of replicability across the discipline because so many ecology studies would be difficult to replicate. However, various people are working around the edges of this problem and I think will generate some useful insights soon. For instance, the collaborative many-analysts project I’m working on has had 171 analysts or analyst teams submit separate analyses of one or the other of two ecology/evolutionary bio data sets, and this should help us understand the degree to which among-analyst variation in statistical decisions can drive heterogeneity in results.

10 thoughts on “Hoisted from the comments: Tim Parker on replicability in ecology vs. the social sciences”

Too often we try to control/limit variation to make stronger conclusions about ‘significance’ of results in experiments. Here is a nice example showing that purposefully including variability can improve reproducibility (https://www.nature.com/articles/s41559-017-0434-x).

Reply ↓

Jeremy Fox on September 21, 2020 at 8:36 am said:

Interesting, hadn’t seen that. Thanks.

Reply ↓

I’d be curious to know how often failures to replicate could ultimately be tied to some version or another of an ecological “Simpson’s paradox”.

Reply ↓

Jeremy Fox on September 21, 2020 at 10:06 am said:

Can you elaborate a little Andrew? I know Simpson’s paradox, but I’m not quite sure I see the connection to replicability.

Reply ↓
- Andrew Stoehr on September 21, 2020 at 10:16 am said:
  
  Sure, sorry. Maybe this is the right way to think about it, but here’s what prompted my question. In my own work (on butterfly wing patterns), I have some traits that I’ve found to be negatively correlated when I consider specimens collected across an entire “year” (March to November). However, the same traits are positively correlated within a “season”, i.e. the traits are positively correlated among specimens collected in spring, and also among specimens all collected in the hottest part of summer. But if you include spring AND summer, they’re negatively correlated. Now I’m thinking about previously published studies that sometimes positive, sometimes negative, sometimes zero correlation among these traits and often try to find explanations that, maybe, are entirely consistent with analyses conducted at different “levels” of that across vs. within problem. The different patterns still require an explanation, of course, but failure to replicate across studies might not be due to fundamentally different biological or methodological differences (besides the “level of analysis”) among the studies. The previous post about whether genetic variation is included is what got me to thinking about this “across vs. within” question. Hopefully that clears up what I was getting at (even if it reveals that I’m missing something fundamental!)

Thanks for this post, Tim! I’m curious about the unlikely hypothesis problem you allude to as applied to behavioral ecology–could you say a bit more about what you mean?

Reply ↓

Tim Parker on September 21, 2020 at 11:12 am said:

Hi Ambika. First, let me be clear that I am shooting from the hip here. As I acknowledge in the post, I have not systematically examined the ecology literature, so what I have are my own informal impressions.
That said, the kinds of things that I sometimes see in behavioral ecology papers that make drive down my informal estimate of prior probability include:
-scenarios that seem surprising to the reader. I think that some of the behavioral ecology papers that receive the most attention in the popular press fall into this category.
-scenarios that contradict a priori understanding or expectation (these are probably fewer, but I can think of some)
-interaction terms which require complex interpretation and which are not clear predictions of sound theory, but rather require ad hoc invocations of complexity
I am not going to mention specific papers here for obvious reasons.
I want to be clear that I am not saying that surprising findings are wrong. Only that they will be more likely to be wrong than are unsurprising findings.

Reply ↓

https://twitter.com/DavidMShuker/status/1308315398232772608

Reply ↓

Pingback: Poll results: how replicable do ecologists think ecology is, and why? | Dynamic Ecology

Pingback: Friday links: modeling the light at the end of the tunnel, and more | Dynamic Ecology

Dynamic Ecology

Multa novit vulpes

Hoisted from the comments: Tim Parker on replicability in ecology vs. the social sciences

10 thoughts on “Hoisted from the comments: Tim Parker on replicability in ecology vs. the social sciences”

Leave a Comment Cancel reply

Share this:

10 thoughts on “Hoisted from the comments: Tim Parker on replicability in ecology vs. the social sciences”

Leave a Comment Cancel reply