Retraction Watch has the story of a large correction to a recent ecology paper. The paper estimated the cost of invasive plant species to African agriculture. The cost estimate was $3.66 trillion, which turns out to be too high by more than $3 trillion. The overestimate was attributable to two calculation errors, one of which involved inadvertently swapping square hectares for square km. Kudos to the authors for correcting the error as soon as it was discovered.
But should the authors have found the error earlier? After all, as the linked story points out, the original estimate of the agricultural cost of invasive plant species–$3.66 trillion–is much larger than Africa’s entire GDP. The calculation error was discovered after a reader who didn’t believe the estimate repeated the authors’ calculations and got a different answer. But it’s not as if the authors were careless. They’d already double-checked their own calculations. Mistakes happen in science. And sometimes those mistakes pass through double-checking.
This isn’t the first time something like this has happened in ecology. Here’s a somewhat similar case from a few years ago.
Which raises the question that interests me here: what should you do if you obtain a result that seems like it can’t be right? Assume that the result merely seems surprising or implausible, not literally impossible. It’s not that you calculated a negative abundance, or a probability greater than 1, or calculated that a neutrino moved faster than the speed of light. Ok, obviously the first thing you’re going to do is double-check your data and calculations for errors. But assume you don’t find any–what do you do then?
I don’t know. I find it hard to give general guidance. So much depends on the details of exactly why the result seems surprising or implausible, and exactly how surprising or implausible it seems. After all, nature often is surprising and counterintuitive! In the past, we’ve discussed cases in which ecologists had trouble publishing correct papers, because reviewers incorrectly found the results “implausible”. I don’t think it’d be a good rule for scientists to never publish surprising or unexplained results.
Here’s my one concrete suggestion: I do think it’s generally a good idea to compare your estimate of some parameter or quantity to the values of well-understood parameters or quantities. Doing this can at least alert you that your estimate is implausible, implying that you ought to scrutinize your estimate more closely. I think such comparisons are a big improvement on vague gut feelings about plausibility. So yes, I do think you should hesitate to publish an estimate of the effect of X on African agriculture that massively exceeds African GDP, even if you can’t find an error in your estimate.
But it can be hard to implement that suggestion. Because your own subjective judgments as to what’s “implausible” are pretty flexible, even when disciplined by comparisons to well-understood data points. Humans are great rationalizers. Once you’ve double-checked your implausible-seeming result, you’re probably going to start thinking of reasons why the result isn’t so implausible after all. Everything is “obvious”, once you know the answer. For instance, as I said above, I feel like that massive overestimate of the effect of invasive species on African agriculture probably shouldn’t have been submitted for publication in the first place. The estimate is just too implausible. But is that just my hindsight bias talking? I don’t know.
Which I guess just goes to show why we have peer review. Your own subjective judgments as to what’s “implausible” are different than other people’s. So at the end of the day, all you can do is double-check your work as best you can, then let others have a look at it with fresh eyes. All of us working together won’t be perfect. But hopefully we’ll catch more errors than if we all worked alone.
Have you ever found a result that seemed like it “must” be wrong? What did you do? Looking forward to your comments.
I recently taught a post-graduate course where I started by quizzing the students on their ability to use logic to estimate what the right answer ‘should’ be.
Using factually straightforward, but non-obvious, questions, I gave them multiple-choice options that varied over orders of magnitude (e.g. how many mammal species are there in Africa: about 120, 1200, 12000, or 120000 species?). The idea was not to test their general knowledge, but to see whether they could apply logic picked up through experience to narrow the potential options to what seems most reasonable.
The course was actually about the post-2020 Global Biodiversity Framework, but I reasoned that the students wouldn’t be able to evaluate the feasibility of global targets unless they also had some ball-park idea of what needs to be conserved. A target like “reduce extinction risk by at least 10%” is quite meaningless to someone who doesn’t roughly know how many species there are and what proportion are threatened by extinction.
I suspect the authors (and editors, and reviewers) of the paper mentioned in the post got so swept up by the large numbers that they didn’t even stop to try an put it in context.
But from a scientific point of view, they would have picked up their mistake had they performed a basic sensitivity analysis on their estimates. One would think that tweaking each parameter by 1% and exploring the outputs would’ve shown that their total cost estimate was extra sensitive to the ‘per hectare’ estimate…
That seems like a very useful exercise to put your students through. How do they do on it?
“I suspect the authors (and editors, and reviewers) of the paper mentioned in the post got so swept up by the large numbers that they didn’t even stop to try an put it in context. ”
Yeah, that’s my guess too.
The students did OK; neither great nor terrible. But the most value came from talking through each of the questions afterwards to try and unpack the thinking process that lead to the closest answer. In almost every case, the students did a good job in reducing the options from four to two plausible answers, even if they had to resort to guesswork for their final choice.
I think the exercise also helped them distinguish between different degrees of uncertainty around estimates. For example, they realised that it ok to assume that there are 1200 mammal species in Africa even though they might not know what the *exact* number really is, while also realising that it is probably not OK to assume that there are only 120 species. I hope that they now appreciate how there are varying degrees of ‘wrongness’, and that information can still be useful even if not 100% correct.
I always ask what are the number of species in the family Aphididae and the Class Mammalia 🙂
This reminded me of how when I was an undergraduate I was shocked to read that picture-winged flies and bustards were in the same family! Only later did I realise that the insect-family is spelt Otitidae and the bird-family is Otididae…clearly I lack the attention to detail needed to be a taxonomist.
Sure seems like there was a failure in the review process in this one too. The reviewers or editors could have requested to see the data to do a check, for example. They could have saved the authors some embarrassment and work.
A year or so ago, I reviewed a paper for a post-doc and the values in the tables didn’t line up with the values in the figures. I flagged that and sent it back along with other corrections. I re-reviewed and the corrections were made except for the data issue. The author said there was no mismatch, but didn’t explain the reason for the differences. I explained what appeared to be the data mismatch in a different way on the second review and sent it back again, with a note specifically to the editor. Second revision came back to me again, still with the same problem. I returned it to the editor again and by that time was pretty frustrated that the problem wasn’t either fixed or explained, so I told the editor that I wasn’t going to look at it again and as far as I was concerned the paper wasn’t publishable, but of course that was up to the editor. My point here is mainly that we still have to be really diligent and detailed as reviewers despite the multitudes of other things on our desks. There are times that I have misgivings about something in a paper, but can’t really identify the issue. As a reviewer, I think it’s worth mentioning those even if you aren’t sure it’s an issue.
The postscript to the story is that the author was followed by someone that I follow on Twitter. The author (who I’ve never interacted with) popped off about “reviewer 2” or whichever reviewer I was on Twitter, including a quote, after the second review. (Shrug. Whatever.) I saw it because the person I follow liked or commented on his tweet. After the third review, the author, to his credit, got back on Twitter and admitted there was an error and thanked the anonymous reviewer. (Yay Peer Review.) I don’t remember if it was transcription or scaling, or transformation, but it was all fixed, and the author now won’t have to post a correction because I’m fairly certain others would have eventually found the problem.
I’m certain peer review experiences like this happen all the time (except the Twitter part). Most errors are probably resolved after the first review. It’s too bad that wasn’t the case here.
Agree, this kind of thing (and also outright mistakes) gets caught often at the peer review stage.
That’s a funny story about the Twitter interaction. Glad it has a happy ending. Hopefully the author in question learned to be a bit more cautious about popping off about “reviewer 2”.
Unfortunately, I bet many more people saw, liked, and retweeted the author’s original complaints than saw, liked, or retweeted the author’s retraction of those complaints. And I bet complaints like that get shared, liked, and retweeted much more often than people tweeting positive things about peer review. Even though we know from survey data that most scientists mostly have very positive experiences with peer review. Social media has many uses, but “giving an accurate impression of the full distribution of everyone’s experiences” is not one of them.
I wrote a complicated model of natural selection, and discovered that the selected allele seemed to spend about as much time going down in frequency as going up. You better believe I looked for an error in my code for a long, long time.
Finally I realized I was modeling the stationary process. So, most of the time the favored allele is fixed. Occasionally it comes unfixed for a bit due to a new mutation, and goes down in frequency. Then selection pushes it back to fixation. Drift pushes it down by some random amount, and then it rebounds by that exact amount, giving the impression that it’s just as happy going down as up…. So my code was right but the process I was modeling was totally not the one I was interested in (the fate of a new favorable mutation).