Friday links: behind the scenes of the first 17 months of #pruittdata, another serious data anomaly in EEB, and more (UPDATED)

Also this week: supervillain vs. DNA, science vs. philosophy of science, the optimal level of scientific fraud, and more.

From Jeremy:

Another week, another serious data anomaly in EEB. This one relates to Thompson and Newmaster 2014 Biodiv Conserv. First author Ken A. Thompson is trying to get the paper retracted, or failing that have himself removed as an author. Ken didn’t collect the raw data and never saw it at the time. He was provided summary information to write up by his then-undergrad supervisor, Guelph professor Steven Newmaster. Ken has now looked at the underlying raw gene sequence data on which the study was purportedly based, and those data appear to be from other studies entirely. Some of the sequenced genes actually came from a site 500 km away from where Ken was told they were collected. And Ken himself can’t reproduce the results in the paper from the data that purportedly underpin the paper. I’m not an expert on gene sequence data, but having read all the supporting material Ken provided, it certainly looks to me like there are serious anomalies here. I hope an explanation will be forthcoming. UPDATE: see the comments for further details from Ken A. Thompson himself that I didn’t include in my brief summary, as well as some new information about the case. /end update

Also this week in “EEB data anomalies”: Am Nat EiC Dan Bolnick pulls back the curtain on the first 17 months of #pruittdata. A long blow-by-blow of what happened behind the scenes, from the person who was at the center of events. Includes an unredacted copy of a letter Pruitt’s lawyers sent to Dan, which includes numerous demonstrably false statements about Dan. Good times. I knew most of this already, because I was heavily involved with the investigation from early on. But if it’s new to you, grab a coffee and settle in. You should read this, it’s important (and riveting). You will come away tremendously impressed by how Dan handled this situation. Well all owe him enormous thanks. Of course, while #pruittdata is (hopefully) over for Dan, it’s not over. Presumably, one of these days months years (?) McMaster University’s investigation will actually conclude. With all due respect to the lawyers conducting the investigation, and recognizing that doing it right is more important than doing it fast, I don’t understand why even the most meticulous investigation wouldn’t have concluded by now. The length of the investigation has now exceeded the mean and median length of previous investigations into similar cases, by a few months and counting…

Sticking with Dan Bolnick, here’s Dan thinking out loud about when, and why, to retract a paper. The answer depends on what purpose(s) you think retraction should serve. Includes some fascinating historical examples.

Which scientific disciplines cite philosophy of science? Here’s the data.

This next link is mostly here to illustrate the weird connections my brain likes to draw…We’ve talked in the past about how there’s some socially-optimal level of scientific fraud, and it’s not zero. Now, here’s Matt Levine pointing out that there’s some optimal level of fraud for the fraudster, and it’s not “as much fraud as possible”:

What if your companyโ€™s business is crime? What if you are a Mafia family, or a ransomware hacking group? Clearly the optimal level of crime is not zero: If your business is crime, you have to do crime to get revenue. But the optimal level of crime is also not infinite: If you do too many crimes, or crimes that are too bad, you will get in too much trouble. Doing more and bigger crimes should increase your crime-based revenue, but it also increases the resources that officials will expend on trying to shut you down, and thus your risks of being stopped and punished. At some point the lines cross, and you should forego some criminal revenue in order to keep your legal risk manageable.

Ok, Matt Levine’s not actually talking about scientific fraud, he’s talking about the recent ransomware attack on a US oil pipeline. But the point generalizes, I think. Do people who commit scientific fraud ever think about this point? (Again, I know that is a weird question to ask!)

Honestly, I can see where this supervillain is coming from. ๐Ÿ™‚

Teacher evaluation form for the Spring 2021 semester. ๐Ÿ™‚ ๐Ÿ˜ฆ

And finally, since everyone I heard from seemed to like last week’s musical link, here’s another one for you:

Have a good weekend. ๐Ÿ™‚

13 thoughts on “Friday links: behind the scenes of the first 17 months of #pruittdata, another serious data anomaly in EEB, and more (UPDATED)

  1. Thanks for including my situation in the links, Jeremy. Just commenting with a few small clarifications and updates.

    The locality information and all of the species’ presence/absence information are almost identical to that from an unpublished Master’s thesis (the Webster thesis).

    After I raised concerns at the University of Guelph, my co-author began to upload gene sequence data (after previously claiming in writing that he was unable to locate it) to GenBank that was apparently used to generate the primary data sheet that he provided me with in 2013/2014.

    According to Dr. Paul Hebert (see comment on my blog post), who knows more about this than I do (so he should as the [more-or-less] creator of barcoding), the recently archived sequence data bear a surprisingly strong resemblance to samples from the “Canada-wide plant barcode library”, which is publicly available data. Although we claim in the paper that the data were generated at the Canadian Centre for DNA barcoding, Dr. Hebert (presumably because he can see the internal data files that I do not have access to) indicated that this is not the case.

    Moreover, I’ve since discovered that there are 32 species in the newly archived genetic data that are not in my primary data but were identified in the Webster thesis. Puzzling.

    This morning I became aware of yet another serious issue with the recently archived genetic data that I am in the process of investigating.

    It’s important to point out that the two institutions with jurisdiction here – The University of Guelph & the journal Biodiversity and Conversation – both declined to investigate when I brought these concerns to them. With Guelph especially, it is difficult for me to express how disappointed I am with their failures here. In no case did they speak with me nor did they provide any justification whatsoever for declining to proceed with an investigation.

    Like you, I hope an explanation is forthcoming. I also hope that the difficulties I have had with blowing the whistle on my own paper highlight how poorly-suited our current institutions are for adequately resolving cases like this. I believe we need a central ethical review board in Canada, like is the case in Sweden.

    I will continue to post updates using my name via PubPeer (

    Ken A. Thompson

    • A few years ago a 747 plane ran out of fuel over Red Lake ON. The reason for this is a switch from imperial to metric had recently occurred and the fuel order was given in imperial and delivered in metric. The plane effectively had insufficient fuel to reach its destination. One could blame the person who fuelled the plane, but the pilot took full responsibility for not doing due diligence and checking the fuel load prior to takeoff. Since the pilot of a plane is willing to take full responsibility for the plane I think senior authors of paper need to take full responsibility for all aspects of our papers.

      Given this I think a clear explanation is Ken, as senior author of the paper, like the pilot of the 747 failed to do due diligence. I say this for two reasons:

      Firstly, Ken acknowledges me for the source of the NEBIE plot network data. While it is true that I lead the NEBIE plot network, I can assure the readers of these posts that Ken did not contact me at any time prior to publication to confirm that he was using NEBIE plot network data. While it is common etiquette to request permission to use someone’s data prior to publication Ken never made that request.

      Secondly, the DNA data is public domain and again it appears from this post that Ken did not confirm the source till after publication.

      If Ken had done due diligence would there be an issue? Probably not

      Thus, a take home message from all this is that the senior author of a paper shouldn’t wait till after a paper is published to confirm the source(s) of their data.

      F. Wayne Bell, PhD

      • Let me see if I understand you correctly. You think Ken A. Thompson–an UNDERGRADUATE at the time this paper was written–was the “senior” author? So that it was ultimately his responsibility, and not his professor’s, to ensure that the data *provided by his professor* was not seriously mistaken or fraudulent? I can hardly believe that’s your view, but yet I also can’t see any other way to read your comments.

        I’m curious whether you are willing to generalize that principle. Do you think that responsibility for the many anomalies underlying Jonathan Pruitt’s numerous retractions lies not with Pruitt, but with the trainees and collaborators to whom he provided data? Or think of social psychologist Diederik Stapel, who provided fake data to many trainees, and as a result lost his job and had dozens of papers retracted? Are you suggesting that injustice was done in the Stapel case? That Stapel should’ve kept his job and his papers, because the fault was with his trainees who failed to discover at the time that they were being lied to by their supervisor?

        Leaving aside any question of who should’ve done what at the time the paper was written, you know who is spending a lot of time trying to correct the scientific record *now*? Ken Thompson. You know who is apparently resisting Ken’s efforts to correct the record, or at a minimum failing to aid them? Steven Newmaster.

        Re: the DNA data being in the public domain: as Ken explains, the DNA data that purportedly underpins the paper (but does not in fact do so, for reasons Ken explains) was not placed into the public domain until long after the paper was written. Please explain how Ken was supposed to have done due diligence at the time on data that wasn’t provided to him or to anyone else.

        Re: your airplane analogy, I think it’s a very poor analogy, but it may perhaps be unintentionally revealing. After all, the plane really wasn’t fueled properly. Analogously, I take it you’re granting Ken’s claims–that there really are serious anomalies here?

        I note that your Guelph colleague Paul Hebert has indicated publicly his agreement that there are in fact serious anomalies here that need explanation. Paul himself identified an additional anomaly above and beyond those identified by Ken. Paul has also indicated that, in his view, Guelph’s investigation was not sufficiently independent or rigorous. Paul’s remarks are here: Paul Hebert pioneered DNA barcoding and is the world’s leading expert on the technique, as I’m sure you’re aware. So I’m sure that you, like me, take his comments very seriously.

        With all due respect, I’ve reread your comments several times now, trying and failing to find a generous way to read them. But I find it very hard to read them as anything other than an attempt to muddy the waters and shift blame. I certainly hope that wasn’t your intent. Nothing in your comments addresses the serious anomalies that have been identified in this paper. I hope you will take the opportunity to do that now. Because I’m sure that you, like Ken, Paul, and I, want to uphold the integrity of scientific practice and the scientific record. Do you agree or disagree that Ken and Paul have identified serious anomalies in need of explanation? And do you agree or disagree that it is Steven Newmaster’s professional and moral responsibility to provide that explanation?

      • Wayne: I have little to add beyond what Jeremy wrote. You are claiming that an undergraduate student who (i) received data from a tenured professor as a part of a mandatory course for credit; and (ii) was pushed to to publish the findings by this professor director; bears responsibility for the integrity data. I suspect most people will agree with Jeremy and I that the responsibility lies with the professor.

        Your analogy is problematic. You shouldn’t consider me a trained pilot, more someone on their first day of flight school.

        I do take responsibility for not saying something earlier even though I did have concerns, and just hope people can sympathize with why it was difficult to come forward as a lone whistleblower. I also think it’s good you posted here, after you expressed a similar sentiment in our private correspondence.


      • Dear, F. Wayne Bell, PhD – I look forward to your responding to the questions very clearly outlined by Jeremy Fox. For future reference, a pilot responsible for an aircraft, and who take responsibility for any issues pertaining to that aircraft, is commonly referred to as the Captain. That you clearly expect Ken to act as a captain a few years into his undergraduate degree is concerning. We need to recognize that students, like Ken, are not in a position of power, unlike his professor (but not Captain i guess). I only hope you provide better guidance to your potential students than it appears you are willing to give in this case.


      • I didn’t bother to list all the ways in which Wayne’s airplane pilot analogy is bad, but here’s one:

  2. Latest on Pruitt: he’s writing a fantasy novel. No, really:

    You’d think that he’d want to keep his head down….

  3. Pingback: Friday links: Fast Grants, NIH vs. harassment, and more | Dynamic Ecology

Leave a Comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.