Should publishers flag replication failures in the same way they flag retractions?

In psychology, published replication attempts have no detectable effect on the rate at which the original paper is cited, whether or not the original result replicated. That’s at least in part because the replications themselves are hardly cited. Andrew Gelman discusses this result, and similar results in other fields. Including discussion of a radical proposal: studies that fail to replicate should be flagged by publishers, just like how retractions are flagged by publishers! Analogous to how the WestLaw database doesn’t just show lawyers all the court cases on issue X, but also indicates which cases have been cited approvingly, affirmed, or overturned in appeals and subsequent cases.

To which: I dunno man.

The analogy to WestLaw is an interesting one that I need to think more about. But right this second, I feel like scientists have to do their job. Which includes knowing the literature. It’s not, and shouldn’t be, up to publishers to know the literature for them.

Plus, are we sure that not knowing the literature is really the problem here? Is “lack of awareness of failed replications” the reason why they’re not cited, and the reason why the original studies continue to be cited? I kind of doubt it.

Which brings me to my next thought: I want to know more about how the original studies are cited, before I get too worked up about the lack of effect of replication failures on their citation rates. For instance, in ecology, Joe Connell’s original 1978 Science paper coining the intermediate disturbance hypothesis still gets cited a ton, even though the IDH hasn’t stood up to empirical or theoretical scrutiny (Fox 2013). But almost all of those citations these days are either from obscure journals, or else they’re throwaway, “fill in the box” citations from papers that often aren’t even about the intermediate disturbance hypothesis. I’m not too bothered that refutations of the IDH haven’t filtered down to obscure journals, and haven’t stopped people from using Connell (1978) as a throwaway citation. This is very much in contrast to citations in court decisions. In court decisions, AFAIK there’s no equivalent of the sorts of throwaway citations that comprise a decent fraction of all the citations in any given ecology paper.

Finally, what exactly counts as a “replication attempt” that publishers ought to flag? Do “conceptual replications” count? Effect sizes in ecology are so heterogeneous (Senior et al. 2016), that I question whether ecology ever has replication attempts that are so exact that they ought to be flagged by publishers in the same way that retractions are flagged. “This study failed to replicate, so you should consider it to be refuted and not cite it, just like how you wouldn’t cite a retracted study” just doesn’t seem like the kind of thing ecologists ever ought to say. You’d end up flagging, and refusing to cite, every study in ecology, including the replications. Again, this seems like a contrast to court cases on WestLaw. There’s no ambiguity as to whether a court decision was overturned or upheld on appeal.

To be clear, I do think that ecologists can and should get better and quicker at abandoning scientific ideas that haven’t panned out, or aren’t likely to pan out. I just don’t think that we’re going to improve on that front by building a search engine or database that automatically flags studies that failed to replicate, however “replicate” is defined. But I could be wrong! Looking forward to hearing what you think.

8 thoughts on “Should publishers flag replication failures in the same way they flag retractions?

  1. Absolutely not. What counts as failure to replicate results is not at all clear, and journals are in no position to identify it. Even claims that some idea has been theoretically overturned are not all that solid. Having published theoretical models that demonstrate strong intermediate disturbance results, I don’t agree with Jeremy’s claim that the IDH has failed theoretical examination. We might debate this, which could be fun, but should TREE decide to flag Jeremy’s paper on the grounds that it doesn’t replicate my papers? Or should the Bulletin of Mathematical Biology or the Biol. J. Linnaean Society flag my papers because they don’t agree with (“replicate”) Jeremy’s? I would strongly disagree with either of those actions.

    So I definitely agree with the conclusion that having search engines or databases that decide what constitutes failure to replicate and then flag it would be a very bad idea.

  2. I noticed recently that some Wiley publications have a “Citation Statements” section, which indicates how many of the citations are ‘supporting’, ‘mentioning’, or ‘contrasting’. You can see an example here, in a publication that explains the feature: https://onlinelibrary.wiley.com/doi/10.1002/leap.1379

    I’m intrigued by the idea, though, like you, I’m not sure how much of an effect it will have.

    • Yeah, I’m intrigued by those algorithmically generated citation statements. I have no idea how good the algorithm is, though the results seem directionally correct (most citations are “mentioning”, followed by “supporting”; “contrasting” citations are very rare).

  3. Agree with everything said. Ecologists need to get better at killing off ideas. But publisher “replication failure” statements is not the way. The reason it is not the way (and why ecologists are bad at killing off ideas) is that so much in ecology is contingent on conditions (taxa, ecosystem, weather good or bad, predators high or low, etc). Therefore one instance that contradicts a theory might be bad news for the theory. Or bad news for the experimenter who encountered weird experimental conditions. No way publishers should be wading into deciding that.

    Ecologists just need to get better at letting evidence accumulate on an idea (which we do pretty well), and then boiling down a summary of the evidence that says this idea works or doesn’t (or is somewhere in between – works some of the time). We’re really bad at that. Partly because ecologists don’t like to offend each other (I really think we’re a more conflict avoidant field than most academic fields). Partly because we don’t incentivize the work of building summary positions. And partly because the continent nature of ecology always leaves wiggle room for people who don’t want to disbelieve. Confirmation bias runs way further in ecology than it does in, say, physics.

    But those are excuses. We do need to get better at making ideas go extinct.

  4. A false positive in the original study is only one of the many valid reasons why a replication might “fail”. This is just as true in experimental psychology as it is in the (messier experiments of) ecology. With the exception of the most controlled lab experiments in e.g., genetics, suggesting papers be flagged (as “refuted”) because of one or even a few failed replications is a misunderstanding of the reproducibility literature. And the failure to reproduce can be very informative.

    Bruna, E.M., Chazdon, R., Errington, T.M. and Nosek, B.A. (2021), A proposal to advance theory and promote collaboration in tropical biology by supporting replications. Biotropica, 53: 6-10. https://doi.org/10.1111/btp.12912

  5. While scientists should be aware of the literature around a topic, given the amount of papers that are published, it’s just not realistic to expect them to read everything and even the most diligent researcher will miss papers. I think there is value in doing straight replications (perhaps a nice way for masters students to get actual research experience with a clear path) and they should be flagged, not only for failures to replicate but for successes too. In fact, I went so far as to suggest that journals should be obligated to publish replications of their papers.

    “Although novelty is essential for science to advance, science builds on work that has come before; thus, replication is equally essential to ensure reliability. Therefore, we should not rely on a single publication and place an undue emphasis on novelty. This emphasis leads to absurd situations in which attention-grabbing work is published in a high-profile journal, while a failed replication of that same work is not considered because of a lack of novelty. This results in an asymmetry, in which novel but incorrect research can have a higher impact than less original but correct research. One unfortunate consequence of this asymmetry is the reluctance of scientists to do the important work of replicating previous studies.

    I would like to propose that journals should have an obligation to publish scientifically sound replications of work that they have previously published. In addition, building on my previous point, replications should also be linked. Linking would allow readers to see whether someone has attempted to replicate a paper and the result. Scientists being aware that journals will publish replications should help address the problem in which negative results are seldom published, which is important because simulations have shown that publication of negative results is important to prevent incorrect results being accepted as fact. My first three proposals combined would result in a much clearer view of the reliability of a specific piece of knowledge.”
    Bosch J. Four proposals for a more reliable scientific literature. South African Journal of Science 2018;114:3–4. https://www.sajs.co.za/article/view/4811

  6. sometimes we read papers where it’s hard to understand what the authors claim they have ” fully proved”; next, it’s very hard to include a critical remark in a, say, “literature review section”; anonymous reviewers (but are they these former authors?) and editors or publishers, most of them belonging to common, discrete, and similarly thoughtful cliques, usually do not like such comments; thus in order to avoid time consuming confrontations, let’s all of us quote the incorrect claims with the statement “so and so dare to claim that…”; what do you think?

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.