Poll results: how replicable do ecologists think ecology is, and why?

Last week I polled y’all on how replicable you think ecology is, and on the sources of lack of replicability. Here are the responses!

tl;dr: There’s more disagreement about this than about any other topic we’ve polled on.

We got 118 responses; thanks to everyone who responded! Not a huge sample, and surely not a perfectly-representative sample of ecologists, or even of readers of this blog. But as usual with our polls, it’s a sufficiently large and representative sample to be worth talking about.

The first question asked readers to give the percentage chance that a typical ecological study would replicate, where “replicate” means “getting a statistically significant result of the same sign as the original, using either the same data collection process and analysis on a different sample, or the same analysis on a similar but independent dataset.” Here’s a histogram of the responses:

The responses were all over the map. But the distribution isn’t perfectly uniform. If you squint a bit, you’ll see that it looks trimodal. There are replication optimists–a peak of respondents who think the typical ecological study has a 70% chance of replicating. There are replication pessimists–a peak of respondents who think the typical ecological study has only a 20% chance of replicating. And there are replication, um, pessoptimists–a peak of respondents who think the typical ecological study has a 50% chance of replicating. Ok, the trimodality might just be a blip, but I doubt that broad spread of the distribution is a blip. Ecologists disagree a lot about whether the typical ecological study would replicate.

Which as an aside kind of surprises me. I mean, meta-analyses are a thing! Every respondent to this survey has presumably read numerous meta-analyses, which should give you a sense of how often two different studies of the same topic produce a statistically-significant effect of the same sign. As a reader of a bunch of meta-analyses, I feel like the replication pessimists are just incorrect.

Or maybe I shouldn’t be surprised by the level of disagreement here. I mean, it’s not as if most ecologists have done what Tim Parker has done: gone through a systemic exercise to estimate the replicability of a whole bunch of ecological studies. So maybe it’s not surprising that casual guesses about the replicability of the typical ecological study are all over the map.

Perhaps what’s going on here is that many respondents were just focusing on studies in their own subfield of ecology? So the pessimists work in less-replicable subfields, whereas the optimists work in more replicable subfields? Now I’m kicking myself for not asking respondents to say what subfield they work in.

My second question asked what fraction of ecological studies are unlikely to replicate, defined as having less than a 33% chance of replicating. Here’s a histogram of the responses:

As with the previous question, the responses were all over the map, again with a hint of trimodality.

I worry a little bit that the responses were all over the map for this second question in part because the question wasn’t sufficiently clear. A minority of respondents’ answers to this question were inconsistent with their answers to the first question. If you think that the typical ecological study has (say) an 80% chance of replicating, you can’t also think that >50% of ecological studies have less than a 33% chance of replicating. That’s mathematically impossible!

The third question asked respondents about various reasons why an ecological study might fail to replicate. For each reason for replication failure, respondents were asked if it’s involved in many or all failures to replicate, some, or few/none. Here are the responses:

According to the respondents, there are four primary reasons for replication failure in ecology, with the most common being that the original result can only be obtained under specific conditions or in a specific, unusual study system. That seems right to me. It’s what meta-analysts call “heterogeneity”, and we know it’s a big deal (Senior et al. 2016). Following closely behind heterogeneity, in the view of the poll respondents, are lack of power, p-hacking, and publication bias. Many fewer respondents see “bad luck” as a common reason why ecological studies fail to replicate. And few respondents think that fraud is involved in an appreciable fraction of failures to replicate.

My own answers to the third question would’ve broadly agreed with those of the bulk of the respondents, had I taken my own poll. Except that I don’t think publication bias is much of a thing. Ecological meta-analyses routinely test for it and rarely find it.

There weren’t any obvious associations between respondents’ answers to the third question, and their answers to the first two questions. For instance, the rare respondents who think fraud is involved in some or many failures to replicate were all over the map in terms of how likely they think it is that the typical ecological study would replicate.

Bottom line: ecologists as a group have little idea how replicable ecology is, or why. What, if anything, should we do to remedy that?

25 thoughts on “Poll results: how replicable do ecologists think ecology is, and why?

  1. Well….my opinion on potential remedies: publish more meta-analyses….and teach junior and less experienced ecologists (who might well be seniors!!) about them.

    • See, that’s what’s puzzling me a bit, because we already publish lots of meta-analyses, and hardly any ecologists are unaware of meta-analyses or think they’re unimportant (https://dynamicecology.wordpress.com/2019/11/04/poll-results-the-many-ways-ecologists-seek-generality/). But based on the results in today’s post, it seems like there must be some ecologists out there who are well aware of meta-analyses, but whose thoughts on replicability are disconnected from their thoughts on meta-analysis. So I don’t think the issue is unawareness of meta-analyses. The issue is not seeing that meta-analyses have implications for discussions of replicability.

      • Am confused about this comment. I could say many ecological studies will not replicate because they are system-specific and underpowered, but that meta-analyses are useful to find generality. In principle, a convincing meta-analysis can be built up entirely from studies that are each so poorly powered they individually have <33% chance of replicating (if you have enough of them). That's a big benefit of the approach right?

    • I wonder if what’s going on is that many ecologists think decline effects are common and strong. That is, they think that the first study of any given effect/topic often reports a massive, statistically-significant effect size, due to some combination of publication bias, p-hacking, and choice of an unusual study system. Then later studies in other study systems will fail to replicate the initial result.

      If that’s right, more ecologists should have a look at my #ESA2020 talk. ūüôā The “decline effect” is mostly not a thing in ecology, at least not any more. https://dynamicecology.files.wordpress.com/2020/08/esa-2020-slides.pdf

  2. Is there a selection bias in using meta-analyses to determine replicability? The few meta-analyses I’ve read I think tended to dismiss low-quality studies or other studies where the authors, a priori, had reason to think the results or analysis may not be reliable. Though I may just be misremembering these details.

  3. I’m surprised by the answers to the fraud question (though I’m now also realizing I’m confused by the graphic — do the bars not necessarily add to 100%?). Considering the growing amount of fraud discussions in other fields (and the Pruitt mess currently ongoing), I’d have expected to see more people in the “some” camp rather than the “few to none”.

    Of course, as Alvaro discussed in his other post, there’s a whole host of issues in talking about fraud, including drawing the boundaries for what even counts as fraud. I imagine that different definitions could result in different answers.

      • I’m not surprised by the answers to the fraud question, but I am heartened. Like you, I was wondering a bit if the #pruittdata and #perchgate scandals would lead to widespread worries that ecology is rife with fraud. But it seems that most ecologists still think–correctly–that fraud is rare. Anecdotally, it’s my sense that #pruittdata in particular is widely seen as an extraordinary case. Not as the tip of a huge iceberg of similar undiscovered cases.

        The y-axis scale in the figure is # of respondents, not % of respondents. So the three bars for each cause of non-replication should total up to the number of respondents (118), not to 100. (And in a couple of cases, they don’t quite sum to 118 because of a couple of non-responses.)

        Yes, it’s interesting that Alvaro and I looked at a lot of the same data and came to rather different conclusions on the prevalence of scientific fraud: https://dynamicecology.wordpress.com/2020/02/17/some-data-and-historical-perspective-on-scientific-misconduct/ Or maybe not different conclusions so much as different emphasis. As you say, the differences mostly come down to the fact that Alvaro wants to define many common, questionable research practices as low-grade fraud.

      • Ahh, got it — the “100” top line on the graph (plus a not-so-careful read) got me all mixed up.

        I suppose my thinking lines up more with Alvaro’s at the moment — I think the existence of big, obvious, fragrant fraudsters suggests that there’s a host of smaller and more discreet fraudsters that we’ll never discover. A lesson from Elisabeth Bik is that getting a journal to retract an obviously flawed paper is nearly impossible (see the Space Dentist update https://scienceintegritydigest.com/2020/08/06/an-update-on-the-space-dentist-papers/), so I don’t know that you can look at the current retraction rate as a strong indicator of how many papers might be impacted by QRP’s and fraud in general.

        To make sure I’m clear — I think my answer in the poll for the % of papers that would replicate was something like 60-70%. But of those ~30% that don’t, I wouldn’t be too surprised to learn that ~5% are the product of fraud or obvious misconduct. In my mind, that’s a “some” and not a “few to none” amount of fraud, but I imagine that other people could be playing with the same numbers and disagree!

      • “In my mind, that‚Äôs a ‚Äúsome‚ÄĚ and not a ‚Äúfew to none, but I imagine that other people could be playing with the same numbers and disagree!‚ÄĚ

        Yes, this crude poll obviously doesn’t really convey exactly how common any respondent thinks fraud is. FWIW, the respondents who said fraud contributes to “some” or “many/all” non-replications also answered “some” or “many/all” for several other contributing factors too. So none of the respondents think that fraud is *more* common than those other causes of non-replication.

        So, if ~30% of ecology papers wouldn’t replicate, and ~5% of those are frauds, then that would imply that 5%*30%=1.5% of all ecology papers are fraudulent. I think that’s too high to be correct. I’d guess well under 1% myself. Your calculation implies that the vast majority of frauds in ecology go totally undetected. That the fraction of fraudulent ecology papers that ever get even suspected of fraud (say, in PubPeer comments), never mind actually retracted, is very low. And since a disproportionately large fraction of frauds are perpetrated by long-term serial offenders, your calculation implies that there are dozens or even hundreds of totally unsuspected serial fraudsters in ecology, doesn’t it? I mean, even if undetected serial fraudsters write more papers than honest scientists do, there’d have to be a *lot* of them if they’re writing anywhere close to 1.5% of all ecology papers. I could certainly believe that there’s more than one undetected serial fraudster in ecology. But dozens? Hundreds?

      • I think that length of a career, prominence in the field, and number of publications are all positively correlated with the odds a fraudster is caught — that all else being equal, misconduct is more likely to be uncovered when someone has a large number of higher-impact papers. So while I certainly agree that the frauds we uncover are long-term serial offenders, I wouldn’t think that these people are the typical case but rather they are drawn from the high tail of the normal distribution of fraudsters — I think that we tend to catch the most extreme and flagrant cases, but plenty of lower-key lower-impact fraudsters will never be discovered. That 2% number (% of scientists who admit to having ever fabricating/falsifying data) you mention in your older post feels like it’s in the right ballpark to me (though if I were placing bets, I’d go a bit higher), most of whom are enjoying entirely unassuming careers and will never be discovered.

      • That 2% number from surveys might well be an overestimate for ecology. Most surveys are very small and non-random samples from US biomedicine. The one survey with a really big random sample (also from US biomedicine) found well under 1% of respondents admitted fraud.

        There’s also the “lizard man” factor. When you’re surveying to estimate the prevalence of some very rare behavior or viewpoint, even a very small proportion of jokesters can screw up your estimate. Something like 4% of Americans say yes when asked if the world is controlled by “lizard men”. Presumably, they don’t really believe that, they’re just saying they do for the lulz. So if even a very small percentage of scientists say “yeah, I’m totally a fraudster ha ha” in anonymous surveys, that’ll inflate survey-based estimates of the prevalence of fraud.

        You’re right that the serial fraudsters who get caught after having become well-established, prominent researchers aren’t typical. They’re a minority of all the fraudsters who get caught, and perhaps you’re right that they’re an even smaller minority among all fraudsters. I agree with you that, if your back of the envelope math is going to work, it must be the case that there’s an appreciable number of undetected “low-key” fraudsters out there. Non-prominent researchers who’ve faked one or two low-impact papers.

        Could be! I don’t think so, but I wouldn’t bet my life that I’m right.

        One thing that would help would be more big random anonymous surveys of this, that cover someone besides US biomedical researchers.

      • Definitely agreed across the board! The other thing I think is interesting is that I don’t know how much the exact % of fraudulent papers matters — while I’d obviously prefer that fraud is 0% and we caught every case of misconduct, I don’t know that my behavior when presented with a new paper is going to be different whether my prior is a 99% or 95% chance that the research was conducted entirely in good faith.

      • Agreed–I’m not sure it’s *that* important if the true percentage of fraudulent papers is, say, 0.1% instead of 0.5%. It’s just as you say–there are no important consequences to small changes in the number. There’s some range of low percentages for which it’s sensible to default to assuming every paper you read is non-fraudulent. (Unless you have some good concrete reason to think otherwise, of course.)

        “I’d obviously prefer that fraud is 0%”.

        I wouldn’t! Not because I think it’s better to have some non-zero amount of fraud, all else being equal, but because all else isn’t equal. The optimal percentage of fraudulent papers is >0%, *taking into account the costs of the anti-fraud measures that would be necessary to reduce the percentage to 0*. See here: https://dynamicecology.wordpress.com/2020/03/04/scientific-fraud-vs-financial-fraud-the-canadian-paradox/

        Whether we currently have the optimal level of fraud in science, I’m not sure. It’s hard to say. But it’s quite possible we do, or that we’re pretty close. (For “optimal level of fraud”, one should of course read “optimal training, safeguards, policies, etc. to prevent/detect/punish fraud”. The level of fraud is not itself a parameter that we can dial up or down. It’s an equilibrium outcome that reflects other parameters that we can dial up or down.)

  4. I only learned about meta-analyses as a PhD student because I was asked to TA a lab where this was one of the assignment and I couldn’t agree more that it is really puzzling that it is not emphasized more.
    Was it really necessary to include the poll’s second question? I tried calculating was should be my answer given my answer to the first question but got quickly stuck.
    A comment about trimodality: I feel that there would be some particular round numbers that we, as humans, would be attracted to to answer these types of questions. 70/50/30 seem to be the best choices to represent the choices: mostly A, not sure, mostly B.

    • “Was it really necessary to include the poll‚Äôs second question? I tried calculating was should be my answer given my answer to the first question but got quickly stuck.”

      Here’s what I was thinking when I was writing the poll. The first question asks about the typical value of a distribution (whether different respondents thought of “typical” as mean, median, or mode, I’m not sure…). The second question is to do with the shape of the distribution: what fraction of the distribution is below 33% replication probability? I figured I ought to ask both questions because they get at different aspects of the distribution. But perhaps I could’ve asked them in a clearer way.

      And yes, you’re absolutely right, many of the respondents picked round numbers, which could contribute to generating an appearance of trimodality out of an underlying unimodal distribution of opinion. People are very unsure what the right number is, so people in the middle all gravitate to 50, people in the right tail gravitate to 70 (because that’s the round number that centered between “more than 50 but less than 100”), and people in the left tail gravitate to 30 (“less than 50 but more than zero”).

  5. I think publication bias is a problem with meta-analyses. Negative results are not only harder to publish, but less attractive for career building. A meta-analysis might find, say, that there is no publication bias in BEF studies because there’s a relatively even spread across results. It seems to me that the much higher probability that negative results (particularly no relationship results) will not even be submitted or not be published due to review problems is not taken into account in this estimation. Not that meta-analyses still don’t have tremendous value, but I am not convinced this isn’t a problem.

    • Hmm. Except that meta-analyses in ecology routinely test for publication bias with funnel plots and Eggers’ regressions, and they mostly don’t find any evidence of publication bias as far as I can tell from the (many) meta-analyses I’ve looked it.

      I mean, my intuition is the same as yours–that non-significant results will tend to go unpublished. But maybe that intuition is mostly wrong!

      I’d actually be curious to know: what are the ecological meta-analyses that have found the biggest publication biases in funnel plots or Eggers regressions?

      • Good question, I don’t know. To be honest I don’t remember reading about this in any of the meta-analyses I have read (not because it wasn’t there, necessarily). Koricheva and Gurivitch (2014) reviewed meta-analysis use in plant ecology and their section on publication bias is quoted below. It would be interesting to know what the case is more generally.

        “Publication bias occurs when the probability of publication depends on the statistical significance, magnitude or direction of the effect. If exists, it can bias any synthesis or review of the literature, including narrative reviews. Several methods have been developed to test for publication bias in the meta‚Äźanalysis, ranging from simple graphical tools (e.g. funnel plots) to statistical tests and calculations of ‚Äėfail‚Äźsafe numbers‚Äô. Jennions & M√łller (2002a) tested for publication bias in 40 published meta‚Äźanalyses in ecology and evolution and found that publication bias may affect the main conclusions in 15‚Äď21% of meta‚Äźanalyses. Methods for testing for publication bias are incorporated in all meta‚Äźanalytic software. However, the majority (61%) of meta‚Äźanalyses in plant ecology did not include any tests for publication bias (Table 3) or mention the term ‚Äėpublication bias‚Äô in the paper. This means that we cannot be certain how robust the results of these analyses are to possible publication bias. Among the remaining third of meta‚Äźanalyses which did test for publication bias, the majority used funnel plots (scatter plots of effect sizes vs. sample size or variance). However, funnel plots are an inaccurate and unreliable method for assessing publication bias (e.g. Terrin, Schmid & Lau (2005), Jennions et al. (2013)). Nakagawa & Santos (2012) recommended the use of a modification of the funnel plots which has better properties (Peters et al. 2008). Jennions & M√łller (2002a) tested for publication bias in ecological meta‚Äźanalyses by using ‚Äėtrim and fill‚Äô method (Duval & Tweedie 2000), which allows one not only to test, but also to adjust for publication bias. This approach has been so far very seldom used in meta‚Äźanalyses in plant ecology.”

        Koricheva, J. and Gurevitch, J., 2014. Uses and misuses of meta‚Äźanalysis in plant ecology. Journal of Ecology, 102(4), pp.828-844.

      • Interesting that Koricheve & Gurevitch suggest the use of trim and fill. I seem to recall reading a paper showing that it performs very badly. But my memory is vague on that.

  6. This kind of thing makes me think that the only way we’d be able to get consensus on what the true replication rate is would be a systematic study directly estimating it using studies from multiple different subfields along the lines of the Reproducability Project in psychology. Though obviously, this would be a much more difficult project to perform in ecology.

    • Oh, that’s the reference! I had seen that tweet but didn’t know what paper it was quoting. (And I had forgotten that I gave the authors feedback on it!).

      Yes, the juxtaposition of the Acknowledgments with the paragraph to which you refer is indeed…striking.

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.