There’s been much discussion recently of irregularities in the raw data underpinning numerous papers by prominent behavioral ecologist Jonathan Pruitt. No formal investigation by Pruitt’s current or former employers is yet complete; it’s far too early for that. But inevitably, discussion and speculation about the Pruitt case has morphed into broader online discussion of scientific misconduct, defined for purposes of this post as fraud, fabrication, and plagiarism.*
This post is about those broader discussions, not the ongoing Pruitt case. How prevalent is scientific misconduct, what causes it, and what if anything should be done to reduce its prevalence? From what I’ve seen, those discussions of broader issues around scientific misconduct have mostly been informed by single examples, such as the recent case of Peter Eklöv and Oona Lönnstedt. I think that’s understandable but also a little unfortunate. We’re scientists; we don’t ordinarily generalize from a sample size of n=1. There’s a literature on scientific misconduct–we should learn from it! So I spent a bit of time reading the literature on scientific misconduct, some of which I’d read before but some of which was new to me. Here’s a summary of what I found. I am by no means an expert on scientific misconduct. But hopefully this post advances the ongoing discussion in some small way, by raising awareness of the relevant literature.
You should totally grab a coffee and read the whole thing, because some of these data are probably the opposite of what you expect them to be!
Note: this is a long-ish post. (sorry!) Please do read the entire post before you tweet about it or comment on it. And please don’t leap to conclusions about my views on anything that’s not explicitly stated in the post. Scientific misconduct is an important issue on which people have strongly-held views. So I did my best to phrase this post carefully. I can’t promise I was perfect; nobody’s perfect. By all means ask in the comments if anything is unclear, and if necessary I’ll update the post and flag the updates as such.
I went with a Q&A structure:
Is scientific misconduct a recent phenomenon? How far back do cases of scientific misconduct go?
Cases of scientific misconduct go back more than a century at least. The Retraction Watch database of retractions (which is quite extensive though not comprehensive) lists 48 retracted papers from before 1980, many of them retracted for misconduct. For instance, this fake medical case report from 1923.
Notable old cases of scientific misconduct (plus a few borderline/controversial old cases) include:
- In the “Piltdown Man” case of 1912, someone (almost certainly Charles Dawson) faked sensational Pleistocene fossils purportedly from an ancestor of modern humans. Dawson faked many other antiquities.
- In the early 20th century, Cyril Burt fabricated data from twin studies so as to inflate the apparent heritability of IQ.
- Controversial psychologist Hans Eysenk is on track to have dozens of papers retracted, some of them 60 years old.
- R. A. Fisher famously accused Gregor Mendel of fudging his genetic data to improve conformity with Mendel’s laws. Hartl & Fairbanks (2007) defended Mendel.
- In his 1981 book The Mismeasure of Man, Stephen Jay Gould made numerous dubious data analytical choices in order to falsely accuse early 20th century anthropologist Samuel Morton of scientific fraud regarding human skull measurements. Whether Gould’s dubious analytical choices themselves rise to the level of scientific misconduct is a question on which you’d probably get different answers if you asked different knowledgeable people.
- Wikipedia’s incomplete list of notable scientific misconduct cases includes other cases that go back before 1980, besides some of those noted above. Former Harvard cardiologist John Darsee got a 10-year NIH funding ban in 1983 for a track record of serial misconduct going back many years before that. Prominent Boston University medical researcher Marc Straus admitted to using false data in 1982. Dermatology researcher William Summerlin admitted to scientific fraud in 1974.
- Bernard Kettlewell’s famous experiments on evolution of melanism in peppered moths in the 1950s and ’60s were claimed to be fraudulent in a 2002 book by journalist Judith Hooper. My understanding from everything I’ve read is that Hooper’s claims of fraud are groundless. But the case is famous, so I’m including it here because otherwise I’m sure someone would bring it up in the comments. (UPDATE: error in Bernard Kettlewell’s name fixed now. My bad.)
Scientific misconduct is thus much older than many current features of academic science, such as the competitive academic job market or pressure to “publish or perish”. See below for data speaking to the question of whether pressure to “publish or perish” is among the main drivers of scientific misconduct.
How many scientists commit scientific misconduct? Are the ones who get caught just the tip of a very large iceberg?
All the evidence indicates that only a very small minority of scientists ever commit misconduct, though it’s hard to put an exact number on it.
In anonymous surveys, about 2% of scientists admit to having fabricated, falsified, or modified data at least once (Fanelli 2009). Note though that there aren’t that many surveys, many of them have tiny sample sizes (<200 scientists), some are highly non-random samples, and many of them focus on US biomedical researchers. Also, some people won’t admit to bad conduct even in anonymous surveys. UPDATE: On the other hand, some people will say all sorts of things in surveys just for the LOLs. 4% of Americans claim to believe that lizardmen control the Earth. So I wouldn’t necessarily assume that that 2% number is an underestimate. As that last link points out, when you’re trying to poll on the prevalence of any rare, unpopular belief or behavior, any little source of noise, such as a few jokesters, can easily distort your estimate. /end update FWIW, the random survey with by far the largest sample size–Martinson 2005 (n=3247)–found that well under 1% of US NIH-funded researchers admitted to scientific misconduct.
As you’d expect, rates of scientific misconduct estimated from the outcomes of formal misconduct investigations run much lower than that. Presumably because not every instance of misconduct gets caught. Somewhere between 1 in 10,000 and 1 in 100,000 US researchers have been convicted of misconduct in US government investigations (Marshall 2000, Steneck 2006).
How many papers are the product of scientific misconduct?
Few, though it’s of course hard to give an exact percentage because not all papers that are the product of misconduct are detected as such, or are publicly revealed to have been detected as such.
Currently, only about 0.04% of papers are retracted for any reason, so the frequency of papers retracted for misconduct has to be lower than that. And indeed, about 0.02% of papers in the PubMed database had been retracted for misconduct (Claxton 2005, Campos-Varela et al. 2019).
Back in the early oughts, 1% of submissions to the Journal of Cell Biology had improperly manipulated digital images (Steneck 2006). In a much larger study across many journals (mostly biomedical) and a greater time span, Bik et al. found that 2% of published papers had images with features suggesting deliberate, inappropriate manipulation.
UPDATE #2: Random audits of cancer clinical trials found that only 0.28% of trials contained “scientific improprieties”. Random audits of FDA clinical trials conducted between 1977 and 1988 found evidence sufficient to initiate a “for cause” investigation in 4% of trials. Data cited and discussed in Fanelli 2018. /end UPDATE #2
Is the rate of retractions for misconduct increasing? If so, is that because of increasing frequency of scientific misconduct, or because we’re getting better at detecting and responding to misconduct?
Yes, the rate of retractions for misconduct is increasing, because we’re getting better at detecting and responding to misconduct.
The absolute number of retractions has grown over time. So has the fraction of retractions that are due to misconduct as opposed to some other reason (Fang et al. 2012, and see here). About half of retractions, or perhaps somewhat more than half, are now due to misconduct (Fang et al. 2012, Li et al. 2018, and see here). So the absolute number of papers retracted for misconduct has grown over time. But of course, the number of scientific papers has grown over time, so you’d expect the absolute number of retractions for misconduct to increase for that reason alone. You want to look at the frequency of retracted papers among all papers. The frequency of retracted papers roughly doubled from 2003-2009, but stopped increasing around 2012. That increase in the frequency of retractions likely does not reflect an increase in the frequency of scientific misconduct. Rather, it likely reflects increasing efforts to detect misconduct, and to retract papers resulting from misconduct. There are several lines of evidence for this view:
- Journals are now retracting papers much more quickly than they used to (Steen et al. 2013). That is, the average time from when a retracted paper is first published, to when it gets retracted, is dropping.
- In 2004, just 1/4 of high-impact biomedical journals had policies on retractions. In 2009, the influential Committee On Publication Ethics published a model journal retraction policy. By 2015, 2/3 of high-impact biomedical journals had a retraction policy (see here).
- Errata are not increasing in frequency (Fanelli 2013). If you think of scientific misconduct as falling on one end of a continuum, with minor unintentional errors on the other end and various forms of questionable research practices in the middle, then you might expect that everything else on that continuum would increase in frequency if misconduct increased in frequency. But errata, which correct minor errors, are not increasing in frequency.
- The number of journals that have published at least one retraction has increased dramatically over time. But among journals that have published at least one retraction, the mean number of retractions per journal has not increased (Fanelli 2013). That’s consistent with more journals starting to take both mistakes and misconduct seriously, but not with increasing prevalence of mistakes or misconduct. The latter would also lead to an increased number of retractions per journal, among journals that have had at least one retraction.
- The number of queries and allegations made to the US government Office of Research Integrity (ORI) has increased over time, but the ORI’s frequency of misconduct findings has not increased (Fanelli 2013).
- In 2010, the popular Retraction Watch website launched, increasing the attention paid to retractions by scientists and media outlets.
- In 2012, the popular PubPeer website launched, providing a novel means by which potential cases of scientific misconduct could be brought to the attention of journal editors and other scientists.
- Starting around 2004, some journals started using text-matching software to detect possible plagiarism in all their submissions. The subsequent increase in use of text-matching software is presumably what explains the post-2004 increase in the fraction of retractions due to plagiarism.
- Similarly, many biomedical journals now routinely use software to automatically check for certain forms of image manipulation, particularly gel and fluorescence images. Presumably for this reason, the frequency of papers containing inappropriately-manipulated images has been declining since it peaked in the mid-oughts.
- There was a big spike in retractions of conference abstracts around 2009, when the Institute of Electrical and Electronics Engineers started paying more attention to whether its many conference abstracts met its guidelines, and found that thousands of abstracts didn’t.
UPDATE: How much money does scientific misconduct cost funding agencies?
I updated the post because I just stumbled across a paper addressing this question in the context of US biomedical research. From 1992-2012, the US NIH spent approximately $58 million of direct research funding on papers that later got retracted, and on researchers later found guilty of misconduct by the US ORI. That’s less than 1% of the NIH budget over that period. /end update
UPDATE #3: Looking at the semiannual reports of the US NSF Office of the Inspector General, I see that in recent years OIG reports about $8-10 million annually in “questioned costs”, “investigative recoveries”, and “funds put to better use”. That includes costs and recoveries associated with financial misconduct, as well as costs and recoveries associated with scientific misconduct. Note as well that much of the financial misconduct is by institutions rather than individual PIs; it’s mostly not PIs stealing grant money for personal use. For instance, it’s stuff like institutions misspending research grant money on teaching assistants, or not properly accounting for rebates they received on equipment purchases. For context, in recent years NSF’s annual budget (not just grants, everything) has been a bit over $8 billion. So we’re talking about ~0.1% of the total NSF budget going to misconduct that later gets detected, most of which (in terms of the money involved) isn’t the sort of misconduct considered in this post. So even if 90% scientific misconduct associated with NSF grants goes undetected, it’s still <1% of NSF’s annual budget. (Random aside: until I read the NSF OIG’s semiannual reports, I didn’t realize that universities and contractors defrauding NSF is a much bigger deal than scientific fraud is, in terms of misspending or wasting NSF money.) /end update #3
Is it mostly men who commit scientific misconduct?
Depends if you’re considering everyone who commits scientific misconduct, or just the most prolific serial offenders.
More broadly, men are somewhat overrepresented among researchers found to have committed misconduct in US government investigations (most of which are investigations of biomedical researchers), compared to their representation among all biomedical scientists. However, US government misconduct investigations focus on government grant holders. Government grant holders are more male-skewed than are all biomedical scientists for various reasons, many of which have nothing to do with propensity to commit misconduct (e.g., social forces and sex discrimination that steer women towards teaching careers rather than research careers). If you instead compare authors of retracted papers to authors of non-retracted papers from the same issue of the same journal, you find that men are not overrepresented among authors of retracted papers (Fanelli et al. 2015).
Are there other predictors of who commits scientific misconduct, and where papers based on misconduct are published? In particular, is there evidence that scientific misconduct is more common in countries with a stronger “publish or perish” culture? Are “top” researchers especially likely to commit misconduct? Are papers in “top” journals especially likely to be based on misconduct?
Based on what I’ve read, the answers to those questions seem to be (in order), “yes”, “no, just the opposite”, “no, just the opposite”, and “no, just the opposite”.
If you compare authors of retracted papers to authors of non-retracted papers from the same issue of the same journal, you find that authors of retracted papers are more likely to be based in countries that lack research integrity policies, be based in countries in which individual publication performance is directly rewarded with cash (i.e. $X per paper), and in the early phases of their careers (Fanelli et al. 2015). Productive, experienced, high-impact researchers, based in countries that are thought to have a stronger “publish or perish” culture, are less likely than others to produce retracted papers, and are more likely to publish corrections to their papers for minor errors (Fanelli et al. 2015).
Other analyses are broadly in line with the results of Fanelli et al. (2015). For instance, among papers published in Plos One, papers originating from the US, Canada, western Europe, Australia, Japan, and South Korea have a lower frequency of inappropriately manipulated images than expected, given the total number of papers originating from those countries. Papers originating from China, India, and Taiwan have a higher frequency of inappropriately manipulated images than expected, given the total number of papers originating from those countries.
The frequency of papers with inappropriately-manipulated images declines with journal impact factor. Note that that result comes from a random sample of many images across many journals. And among biomedical papers, the proportion of retractions that are due to misconduct, as opposed to some other reason, decreases with journal impact factor.
UPDATE #4: Much of the evidence cited above regarding predictors of scientific misconduct is cross-country comparative evidence. One might argue that one should also look at within-country comparisons instead. Fanelli et al. 2022 did that. Their findings reinforce the cross-country evidence. Using a similar matched-pairs design to Fanelli et al. 2015 (cited above), Fanelli et al. 2022 find that, within wealthy countries like Canada, the UK, and the US, image manipulation is not associated with measures of researcher productivity, experience, or prestige. It’s only within low- and middle-income countries that pay researchers $X/publication (especially China) that image manipulation is associated with measures of researcher productivity, experience, and prestige, in such as way as to suggest that the incentives cause misconduct.
My tentative interpretation of these data is that many factors affect the prevalence of misconduct, different factors have opposing effects, some of those factors are at least somewhat collinear. One broad implication is that there may be many different policy interventions that would reduce the prevalence of misconduct in any given context. In principle, one could imagine dialing down any of a number of misconduct-promoting factors, and/or dialing up any of a number of misconduct-reducing factors. Which factors to try to dial up or down seems like a pragmatic empirical question to me, the answer to which depends on all the usual sorts of considerations–marginal costs, marginal benefits, externalities, etc.
Is a disproportionately large fraction of scientific misconduct committed by a small number of serial offenders?
Steen et al. (2013) found that over 40% of retracted biomedical papers were written by authors with multiple retractions to their names. And in a more recent analysis of a more comprehensive retraction database, the 500 authors with the most retractions (out of a total of 30,000 authors with at least one retraction) accounted for 25% of all retractions. 7% of all retractions in the database between 1980 and 2011 (so, >7% of all retractions for misconduct during that time) are due to a single author (!)
What do we know about the motivations and other attributes of people who commit scientific misconduct? In particular, what do we know about the motivations and other attributes of the rare serial fraudsters who become prominent in their fields?
Above, I noted some systemic factors that predict occurrence of scientific misconduct (e.g., cash payments for publications, lack of research integrity policies). But systemic factors alone can’t fully explain occurrences of misconduct, since after all the large majority of scientists never commit misconduct.** So is there anything that individuals who commit scientific misconduct tend to have in common, that’s associated with them specifically? Rather than with the broader milieu in which they and their many honest colleagues work? In particular, what drives the very rare people who rise to prominence in their fields via serial misconduct?
Hard to say, unfortunately, beyond the fact (noted above) that the most egregious serial fraudsters are almost exclusively men. As best I can tell, serial scientific fraudsters mostly seem to deny wrongdoing, and then just leave science if and when they’re convicted of enough wrongdoing to end their careers (here’s just one of many possible examples). It seems to be rare for anyone who commits scientific misconduct to admit what they did, much less explain their own motivations.
You can make some reasonable inferences about motivation in a few cases. For instance, many of the researchers who’ve racked up dozens of retractions for misconduct were medical researchers pushing their own pet medical techniques or devices. As another possible example, two of the famous old cases of scientific misconduct listed above are to do with IQ heritability and eugenics (Burt, Eyesenk). The common thread here is researchers who were super-attached to their own beliefs. Beliefs they were prepared to push at any cost, up to and including fabrication.
One notable commonality among many of the most egregious serial fraudsters is that they continued to commit fraud long after they reached secure, well-paid, senior positions. Positions that they would have kept even if they’d dialed back their research programs, or wound them down entirely. The worst serial fraudsters seem to keep committing fraud long after, and out of all proportion to, any reasonable “need” to get or keep a good job in science. In my admittedly-cursory research, I haven’t found any examples of serial scientific fraudsters who stopped voluntarily, before being caught. Are there any?
A few years ago, social psychologist and serial fraudster Diederik Stapel wrote a book confessing what he did and purporting to explain why he did it. I’ve skimmed bits of it. It’s engaging, because it’s well-written, because it’s such a rare window into the mind of a serial scientific fraudster, and, well, for the same reason it’s hard to look away from a car crash. But it’s also transparently self-serving. So I dunno. I’m not a psychologist, so I find it hard to separate honest self-reflection from dishonest excuse-making here. Stapel is now a motivational speaker (if that’s the right term…). One wonders if the book was his way of kickstarting his new career.
On their own, these data obviously don’t tell us what, if anything, we should do differently at a systemic level in order to prevent, detect, and punish scientific misconduct. See Dan Bolnick and Andrew Hendry’s blog for some good concrete discussion of that. But hopefully, these data inform that broader discussion. I hope to have a few thoughts of my own on what to do about misconduct in a future post, inspired (as is usual with me) by something I read that isn’t about science. In the meantime, the comments are open. Looking forward to learning from your knowledge and opinions.
*I use this definition not because I oppose other, broader definitions, for instance those that define bullying and harassment as scientific misconduct. Rather, I use this definition just to keep the post to a manageable length, and to keep the focus on the specific subcategories of misconduct that have been widely discussed online among ecologists recently. If you would prefer to discuss other forms of misconduct, such as bullying, you are welcome to comment on our numerous past posts on those other forms of misconduct (for instance here and here and here). The post authors will still see your comments, and reply (or not) just as they always do.
**Like Tal Yarkoni wrote in a slightly different context, “it’s not the incentives, it’s you“.