In finance and economics, the answer is: pretty random. Ivo Welch has a new open access paper based on data from eight leading economics and finance journals. He shows that different referees typically exhibit only modest agreement in their recommendations, and that disagreement is not just a matter of some referees being pickier than others. Rather, disagreement arises in large part from differences among referees in their relative ranking of papers. If you think of referees as trying to estimate some “objective” (i.e. referee-agreeable) attribute of a paper, then referee decisions turn out to be one part “signal” (i.e. 1/3 dictated by that attribute) and two parts referee-specific “noise”. Discussion from James Choi here.
It’d be very interesting to do a similar study on ecology journals, eh? The purported crapshoot nature of peer review is something everybody (including me) has strong opinions about–based only on the small, non-random samples provided by their own personal experiences. Now that online ms handling systems are ubiquitous, I’ll bet the (appropriately-anonymized) data would be really easy for journals to provide.
So, who wants to take the lead in approaching the journals and asking for the data? Come on, Chris, this is right up your alley!
And assuming someone can obtain the data, anybody want to place any bets on whether ecology referees’ recommendations are more or less random than those of economics and finance referees?
UPDATE: I tried to write the post so as to avoid revealing my own view on whether 1/3 signal is “bad”, “good”, “the best we can do”, or whatever. But in the comments, I’ve been misread as arguing for PLoS ONE’s editorial system (which asks referees to judge papers only on technical soundness). So let me reveal my own view. If, as I’d guess, ecology reviews also are about 1/3 “signal”, I’m mostly ok with that, for two reasons. First, referee agreement or disagreement on whether a paper should be published isn’t, or shouldn’t be, the most important thing about reviews. Handling editors don’t, or shouldn’t, just count referee “votes”. They should use the reviews to inform their own judgments on how to make the paper better, and yes, on whether to accept the paper. That was my practice when I was a handling editor. Second, as discussed in previous posts (see here, here, and here), we are never going to do away with judgments about what work is “interesting” or “important”. Further, and importantly, there is just as much disagreement on those judgments among post-publication “reviewers” (i.e. readers) as there is among pre-publication reviewers. So as a reader with very limited time to read, I’m mostly ok with letting selective journals and their referees do a lot of “filtering” of the literature for me, as I’m used to this filtering system and I think the alternatives would be worse. Just because referees and editors of selective journals are making judgment calls, and just because a fair bit of reasonable disagreement is possible on those judgment calls, does not mean those judgment calls should not be made, or that they’d be made “better” (as opposed to merely “differently”) by some other system. But I’m old and set in my ways, so I would say that. ;-)
UPDATE #2: Just wanted to highlight an ongoing exchange of comments with ace commenter Jim Bouldin. Jim expressed a view that I suspect is widely shared: the level of randomness and subjectivity in referee recommendations is appallingly high, and has no business affecting the direction of science. Commenter Mike Taylor expressed a similar view. To which I responded: referees are us. If referees disagree a lot about what science is most worth paying attention to, well, that just means we all disagree with one another a lot about what science is most worth paying attention to. Those disagreements will not go away, or become any less “random” or “subjective”, or stop affecting the course of our science, if we reform or replace the current pre-publication peer review system. That’s not necessarily an argument against any proposed reform of peer review, of course. It’s just a clarification of what any proposed reform would accomplish. “Eliminating or substantially reducing the random and subjective elements of our collective decisions concerning what science to pay attention to” is not a feasible goal for peer review reform, or for reform of any other aspect of current scientific practice. You can shift around where the randomness and subjectivity enter into our collective decision making process about what science is most worth paying attention to, and there may well be good arguments that they should only enter the process at certain points, but you can’t eliminate them. I don’t mean my comments on this as the last word by any means (Jim has indicated that he’s planning to reply in due course). But because many readers read the posts but not the comments, I wanted to highlight this ongoing discussion and encourage readers to follow it, as I suspect it’s a discussion of particular interest to many readers.