This is old (1982!), but I wasn’t aware of it before and I’m guessing you weren’t either. Peters & Ceci took a dozen psychology papers published in leading, highly-selective journals, substituted fake author names and affiliations, and resubmitted the papers to the same journals 18-32 months after the originals were published. Only three of the twelve papers were detected as duplicates, and eight of the remaining nine were rejected, in many cases on grounds of serious methodological flaws.
We’ve talked in the past about other approaches to quantifying the randomness in pre-publication peer review. And I’ve argued that pre-publication review is, like democracy, the worst system except for all the others. So I’m not surprised at these results, or even hugely bothered by them. The grounds cited for rejection of those eight papers are somewhat troubling, as they suggest that limiting pre-publication peer review to considering “technical soundness” would not actually increase the reproducibility of pre-publication review.
But it’s a small dataset, from a different field, so I wouldn’t hazard a guess as to what the data would look like if someone were to do the same experiment in ecology. It would be good to also try it with Plos One as well as with selective journals. And to try it with rejected papers as well as accepted papers, though of course obtaining a random sample of rejected papers would be tricky.
Anyone want to try the experiment? It wouldn’t be that hard…
(HT Leonardo Saravia, via Twitter)
This reminds me of the technique of looking at rejection rates as a proxy for the rigor of a scientific discipline. The idea is that if rejection rates are high in a leading journal, this indicates a lack of consensus on valid methodology and important questions. If the rejection rate is low, it means that authors and reviewers both know and agree what constitutes a significant finding. Fields like philosophy or sociology have extremely high rejection rates, while fields like biochemistry will be much lower, even in leading journals. I learned about this in A Critique For Ecology, which is a few decades old – does anyone know if this technique has been replicated more recently?
Hmm…I doubt that’s actually true. The journal Cell has very high rejection rates, for instance. So do leading general science journals like Science, Nature, and PNAS. So do leading medical journals. Etc. I don’t recall Peters’ data on this (did he even have any data?), so I can’t comment too much on his specific claims. But in general, I suspect the relationship between rejection rates of leading journals, “rigor” of a field (by which we might mean “validity or replicability of the field’s papers”), levels of agreement among workers within a field, and what those working within a field agree or disagree *about*, is rather complicated.
I’d love to see it repeated in (the field of) Ecology and PLOS One. I’d suggest that specifically asking reviewers to focus only on technical soundness actually improves replicability of published results, as they don’t have to worry about perceived novelty or fit in a particular journal.
Of course, all this could also give some useful insight into the average level of technical proficiency of reviewers in a given field.
“Of course, all this could also give some useful insight into the average level of technical proficiency of reviewers in a given field.”
Ah, that’s yet another interesting treatment variable, or really variables. How much do the results of this experiment depend on the content of the paper. Do you get more disagreement, or a more positive or negative response on average, if you use fancy stats? If it’s an experimental paper? Etc. Really, there’s a career’s worth of projects here.
If one were to try this in ecology, I’d also suggest changing the study organism(s) / system since that can be a give away.
Yeah, except that you can’t change that without changing the content of the paper. Basically, I think if you tried this experiment in ecology, you’d have to live with a much higher proportion of papers getting recognized as duplicates.
Another way to run the experiment would be to simultaneously submit the same paper to several journals and look at the variance of their evals.
I suspect you’re right. And if one went the latter option, the paper should be sufficiently broad that the same reviewer(s) don’t end up with multiple versions!
If you’re going to do this experiment, then you have to be prepared to waste some people’s time. Indeed, you might well need to get ethical approval from an institutional review board.
I am afraid that the randomness might be getting worse. I have noticed that nearly all journals I have reviewed for recently lack the recommendation “major revision”, even ones that previously had as I recall. It is replaced by “reject with possibility to resubmit”. This often leaves me in a sort of quantum state choosing between minor revision and reject, and might thus increase the randomness of the actual recommendation. This might cause papers to slip through without a necessary second round of reviewing, and others to be rejected that would have previously made it. Actually journals should really should have done some experiments on this (say, by paying multiple reviewers to assess the same papers with different recommendation options), and tested the variance of the outcome.
My own experience is slightly different. I have the impression that some journals (especially Am Nat and Ecol Lett) are using “major revision” in cases that used to be “minor revision”. I assume as a device to make sure that authors actually take the revisions seriously. I find this annoying. As an editor, you can always say in your decision letter that acceptance of the ms is conditional on the authors revising the ms to your satisfaction, even if the required revisions are minor.
Pingback: Interactions: May High Five | Altmetric.com