Note: this post grew out of an email exchange I had with Stephen Heard last week. Stephen suggested an idea for a post that we both wanted to write. We decided to write our posts independently and post them on the same day. So read this post and then go see what Stephen has to say. I’m curious to see what he has to say too! I predict we said much the same thing, but I hope I’m wrong because that would be more fun. 🙂
Last week I linked to a major case of scientific fraud in psychology. It involved a study of the odometer readings people report to their car insurance companies. Here’s a histogram of one of the key variables in the study:
These data are obviously fake. You ask thousands of people to report how many miles they drove over some period of time, and you get a uniform distribution between 0 and 50,000 miles? Pull the other one, it’s got bells on.
This is a common feature of scientific frauds that involve fake data. Often, the fake data does not stand up to even casual scrutiny.
Which is puzzling. If you’re going to commit scientific fraud, presumably you want to get away with it. So why commit fraud in such a transparently obvious way?
After all, plenty of frauds in other walks of life are designed to stand up to scrutiny. Think of art forgery. Art forgers are very skilled artists. They go to great lengths to make their forgeries stand up to visual inspection by expert art historians who are on the lookout for forgeries. It’s a striking contrast with the laughable obviousness of many scientific frauds.
Or is it? Because there’s an important sense in which the shoddiest scientific fraud and the most careful art forgery are exactly the same. Both are designed to stand up to the scrutiny they’re likely to receive.
If you’re forging a Rembrandt, you know that your painting is going to be closely inspected by expert art historians. That’s why you have to go to great lengths to make it look like a Rembrandt. Whereas if you’re faking the data in a scientific paper, the data are unlikely to be closely inspected by anyone. Heck, until fairly recently your data were unlikely to be inspected by anyone because you weren’t expected to show them to anyone! And even these days, when post-publication data sharing is increasingly the rule in many fields, it’s still rare for the shared data associated with any given paper to be inspected by anyone. Not even casually, never mind closely. So a Rembrandt forgery and a typical fake dataset are similarly obvious fakes, relative to the differing levels of scrutiny they’re likely to receive.
The commonality here is well-illustrated by the fact that many art forgeries are obvious–if scrutinized in a novel way. For instance, the linked piece on art forgery discusses forger Wolfgang Baltracchi, who was exposed via chemical analysis of the pigments he used. His forgery of a 1914 painting used a titanium-based pigment that didn’t exist in 1914. From a visual perspective, it’s not at all obvious that the painting is a fake. But from the perspective of chemical analysis, the fakery is laughably obvious. Chemical analysis of pigments wasn’t a routine method of scrutiny when Baltracchi forged the painting in question; the chemical analysis was only conducted decades later. Something similar seems to have happened in the scientific fraud case I linked to last week. The fake odometer data were originally written up for publication in 2012. The fake data weren’t shared publicly until 2020, in association with a paper that failed to replicate the 2012 result. Data sharing, and replication attempts, are a lot more common now than they were in 2012. They’re novel forms of scrutiny to which the 2012 paper wasn’t originally subjected.
This is a general principle of fraud in all walks of life. Frauds are usually only designed to beat whatever fraud-prevention measures the fraudster knows they’re likely to be subjected to. Hence Dan Davies’ “Golden Rule” of financial fraud detection, which applies to non-financial frauds as well. Paraphrasing, the rule says, in part, that if you think something might be fake, you need to check it out in a way it hasn’t already been checked.
p.s. Before anyone points it out: yes, there are some fraudsters in science, and other walks of life, whose frauds aren’t designed to stand up to the scrutiny to which the fraudster knows they’ll be subject. For instance, consider plagiarism by undergraduate students. I talk to my undergraduate biostats students at length about what plagiarism is, and why it’s against university rules. I tell them that I have ways of detecting plagiarism on lab assignments, even if they plagiarize from an old assignment in an online repository, or only plagiarize part of the assignment, or plagiarize from someone in a different lab section, or paraphrase to try to hide their plagiarism, or etc. I point out the various resources that are available to support them if they’re struggling with their coursework, and encourage them to use those resources rather than resorting to plagiarism. And I tell them that, every semester, I give this same speech and still catch some students plagiarizing lab assignments. And yet, every semester I catch some students who committed plagiarism anyway, and it’s often dead obvious. I don’t know why they do it, and I’ve given up trying to understand it. People aren’t always rational. Sometimes they aren’t even “predictably irrational“.* Sometimes they’re just panicked, or lazy, or drunk, or dumb, or whatever. And sometimes, people act in ways that are completely inscrutable. The point of this post is that the obviousness of many scientific frauds has a rational explanation (nobody’s likely to check for fraud), not that the obviousness of scientific frauds always has a rational explanation.
p.p.s. Now I’m wondering: what are the least obvious scientific frauds in history? The ones that best stand up to scrutiny from many different angles? My first thought was duplication and relabeling of images across papers, from back in the days before automated image matching software. Here’s another candidate, though it apparently involved elaborate steps to cover up the fraud after investigation began, rather than elaborate steps to disguise the fraud in the first place. I’d be curious to hear from paleontological commenters as to how careful a fake Piltdown Man was, relative to the methods of scrutiny available at the time.
*Sorry, couldn’t resist this…obvious…joke. 😉
Pingback: Why are scientific frauds so obvious? | Scientist Sees Squirrel
Although I’m not a paleontologist, I do have a deep interest in paleonthropology; there were suspicions about the Piltdown forgery from the very beginning and comparative anatomical evidence that the jaw, teeth and skull were from different sources. Check out the penultimate and final paragraphs in the ‘Find’ section of the Wikipedia entry for instance: https://en.wikipedia.org/wiki/Piltdown_Man
Coming from biology and working now partially in the field of image integrity testing in life sciences, I can just underline the known that the same issue materializes since long time regarding scientific image data. I wanted to add one thought to your post. And it’s a rather a scary one. I can corroborate, that there is quite a high number of image manipulation cases, as you mention for other data, which seem to be done super sloppy and obvious. But what if those are the only ones, due to the limitations in the detection methods, we are able to find and reveal. Then all the “good” forgeries will never show up, pollute scientific literature and in a snow ball-like effect negatively affect scientific conclusions, discussions, funding expenses and resulting developments (such a medicine), in the worst case potentially even the health of individuals. So, the part which causes some unease to me is rather how high are the numbers in the shadows which we might never be able to detect. Other might argue in favour of the self-correcting nature of science. But I think that argument is rather an illusionary one considering diverse non-executed but necessary corrections/retractions, some influence of nepotism, non-existing expressions of concerns to warn other scientists of problematic data and other hurdles diminishing the corrective effect, to name just a small selection. So, the tip of the iceberg is just the obvious and most likely smaller part of the problem. This is generally a tricky topic with no simple and straight forward solution tagged to it. Here, potentially an interesting read about parts of the problem: https://www.researchgate.net/publication/349521202_Seeing_the_Big_Picture_-_Scientific_Image_Integrity_under_Inspection
“Now I’m wondering: what are the least obvious scientific frauds in history?” How about the long-term activities of the anthropologist Mart Bax, see https://doi.org/10.16995/ee.1646 for backgrounds?
Ooh, interesting! Yes, that certainly seems like a good example of a well-disguised fraud.
One bit that caught my eye: Bax worked alone in the field, at his own field sites. His students worked elsewhere and weren’t dependent on Bax’s data at all. I’ve been wondering lately if the way to get away with fraud is to work solo. Meaning not just “no collaborators” but also “nobody sees you collecting data, or expects to do so.” And further, this wasn’t some unusual choice of his–my understanding is that many cultural anthropologists work solo.
It sounds like Bax also was collecting the kind of data that would be hard to detect as fake, at least using statistical methods. Lots of singular observations, or else fairly small numbers of observations? If you only have one or a few observations of some variable, you can’t make any inferences about the data-generating process–what distribution the observations were sampled from, etc.
And of course, as the linked article describes, there were various other circumstances that were conducive to fraud.
But yet, Bax was caught, eventually. Apparently, because someone was bothered that lots of little details and passing remarks in his lectures seemed wrong, and started digging. And lots of people apparently had been vaguely suspicious of Bax for that reason.
YMMV, but the lesson I take away from the Bax case is that it is just very, very hard to get away with scientific fraud forever. Though I suppose someone else might be more distressed at how long Bax got away with it.
So then what proportion of publications are incorrect? Of those, we have clear fraud, undetected fraud, badly designed, badly analyzed, and black swan (unlucky) papers. Is it 5%? 40%? It seems estimable. At what level does the contamination taint the whole body?
I was just thinking about a related question myself. If you compiled data on how long a whole bunch of scientific frauds lasted, would you be able to infer something about the frauds we haven’t detected yet? A bit like how, if you keep track of how long a bunch of now-dead people lived, you can infer something about the expected lifetimes of people who are still alive.
Another thought: what sort of data on scientific frauds would you need to apply statistical techniques from wildlife biology to them? I’m thinking of estimating where species live, and how abundant they are, from what ecologists call “presence-only” data. That is, all you have are records of where individuals of the species were detected. No records of where someone tried to find the species and couldn’t. That’s similar to scientific fraud–we have many records of detected frauds, but many fewer records of people trying and failing to detect frauds (There are only a few studies in which someone’s randomly sampled a bunch of papers, and then gone through them all checking for some sort of fakery. Elizabeth Bik and colleagues did this for image manipulation.)
Your second paragraph is very interesting. I am not an ecologist but I could imagine the following (sorry, leads me somewhat away from the original topic and this might make an interesting new one). Ttaking all measurable parameters into account, such as terrain, temperature, weather, micro climate, vegetation, pray availability, predator pressure, etc. one yould potentially think of a mathematical model to estimate existance regions of a certain animal on a global level. One should be able to calculate a probability map of potential existance based on those factors measured in places the animals were already observed. Question is, how much input one would need to get a somewhat accurate estimate.
Relating that to fraud, I think there are too few measurables to make a reliable model. There is e.g. individual biases of different kinds (regarding this, I was astonished to find such a link overview: https://en.wikipedia.org/wiki/Bias), subjective perceived pressure, mis-guided monetary incentives, missing or neglected supervision of junior scientists, and, and, and. Most of those are things people would never comment on or admit they suffer from one of those. As I just read in Adam Grants book “Think Again” (very recommendable read): “My favorite bias is the “I-am-not-biased” bias!
So, I think in that case, data acquisition will make it hard to feed a potential model.
Would be worth an interesting discussion though!
This is a very interesting point. Some random thoughts on that:
I do not believe (but believe is subjective) that we can reliably estimate it in the complete absence of any hints on what fraud, methods involved, possibilities and limitations of their detection etc. From Elizabeth Biks’ data as well as my own insight I can estimate that fraud on basis of image manipulation is well around, more likely somewhere above 5% in life sciences. But again, that is based on what we find, neglecting what we do not find. One example for the latter would be if someone shows an image stating that it is from a control sample. The sample, depending on its nature might long be gone already at the time the image is published, depending on its “shelf-life”. And anyway, you can also mislabel samples. So, at some point, the necessary trust in scientific honesty is the only that remains.
For an upper limit, I think the estimation is not really possible. Image manipulation is only a fraction of issues in published literature and more prominent in natural sciences. Statistics has its part (often seen or inspected in e.g. psychology), text plagiarism is rather low (but that is more a guess of mine) in life sciences, but rather higher in fields like social sciences and or politics (the scientific part). In this regard, Deborah Wulff-Weber is potentially a good resource, since she is involved in detecting mainly plagiarism.
Another point is the discussable question of when does negligence or bad design end and manipulation or fraud start? There is surely not only black and white but rather a huge grey zone. What category do predatory journals fall into? Since here, the fraud comes mainly from the publishing side.
So, dependent on what you look at from the list you provided, the individual parts might strongly vary, there will surely be some overlap between the categories and the cumulative sum will most likely stay the big unknown.
Contrast all of this with this statement made the sociologist of science, Robert K. Merton (apologies for the long quote): “The virtual absence of fraud in the annals of science, which appears exceptional when compared with the record of other spheres of activity, has at times been attributed to the personal qualities of scientists. By implication, scientists are recruited from the ranks of those who exhibit an unusual degree of moral integrity. There is, in fact, no satisfactory evidence that such is the case; a more plausible explanation may be found in certain distinctive characteristics of science itself. Involving as it does the verifiability of results, scientific research is under the exacting scrutiny of fellow experts. Otherwise put–and doubtless the observation can be interpreted as lèse majesty–the activities of scientists are subject to rigorous policing, to a degree perhaps unparalleled in any other field of activity.” This is from a paper entitles “The Normative Structure of Science” originally published in 1942, and reprinted in the edited volume “The Sociology of Science: Theoretical and Empirical Investigations” (1973, p. 276).
The first time I read this, many years ago, I was doubtful of the “virtual absence” statement. I would think that nowadays if a lecturer were to make a similar statement, the individual would be laughed out of the lecture hall. Merton looked at science since the 17th century. How did he come about to make that statement? Is it true that up to 1942, science was blessed with a “virtual absence of fraud”? Perhaps. And after that fraud blossomed? Really? What are the social forces nowadays that tempt so many to commit fraud? And perhaps pre-1942 (of course, this is a totally artificial cutoff) there were no or fewer incentives to scrutinize one’s colleagues when one suspected fraud? I am skeptical of Merton’s statement. Although I fancy myself as a sociologist of science (at least, social science), my knowledge of the history of science is not encyclopedic enough to answer these questions (assuming they are the right questions).
See here for data on the prevalence of scientific fraud: https://dynamicecology.wordpress.com/2020/02/17/some-data-and-historical-perspective-on-scientific-misconduct/
tl;dr: all the evidence indicates that fraudsters are a very small minority of scientists, even if you are pessimistic about the fraction of fraudsters who are found out.
Whether the rates of fraud are notably lower in science than in other walks of life, I have no idea.
Regarding the prevalence of fraud in sciences, and how it compares to other realms (e.g. art forgery), I do think there is probably less fraud in science, mainly due to the big elephant in the room — money. There are exceptions, of course, but in most cases people would not do science (mainly) to get rich. It’s just not the best way to go about getting lots of money (of course, there are exceptions like big money in some fields of medicine, but I believe they also involve much more scrutiny as an inherent part of the scientific method practiced there). On the other hand, people who forge famous art pieces do it, largely, for the money. And so I would think that there would be less fraud, and it would be done with less skill (for lack of a better term). As Jeremy pointed out in the context of his teaching experience, some people apparently cheat for unclear/non-rational reasons, and it seems likely that at least some of the fraud in science is done for similar reasons.
This also makes me wonder whether many of the cases of fraud in science are done by people who convince themselves it is not really fraud. Imagine the following scenario: A researcher is getting some data points from empirical work, seeing what appears to be a clear trend in the data, and then realizing that the number of data points is not enough to be sure about this. But this person might convince themselves that the trend they found is real, except that they cannot publish this due to the limited data points (I’m assuming more data points cannot be had due to some issue, e.g. lack of funding). An easy “fix” is to duplicate the data points, if there were 20 before, now there can be 40 or more, and Viola, now they can publish a paper. I can imagine in their minds they did not really cheat (i.e. make fake data points), but just cut some corners. I would assume similar scenarios exist.
The point here, is that in this case, the data “faking” (i.e. making the extra data points) might be badly done, in the sense that it is easy to spot, because the researcher does not want to invent data points from thin air, but rather to have more of the same kind of data points. Could that explain why some of the forgeries are inherently bad and easy to spot? For comparison, I can imagine that perhaps some art forgers convince themselves that they are not “bad people” as they cheat money out of rich people (much like in confidence tricks) who don’t really need it, but I think they would stick acknowledge they are not producing art in the typical sense.
In terms of the motivations of scientific fraudsters, I don’t think we know a whole lot. And I’d also say that the motivations probably vary. For instance, in some middle income countries that have policies of paying cash awards for publishing papers in certain journals, the motivation probably is to make money. But if we’re restricting attention to frauds committed by scientists in wealthy countries, yeah, “making money” probably isn’t a common motivation.
And yes, some fraudsters probably don’t think of it as fraud. Because they’re sure they already know what the scientific truth is, it’s just that the data are too limited or messy to reveal the truth. So they massage the data.
I wonder if another common motivation is a feeling that scientific research is all just a big, meaningless, winner-take-all game, with a career in science as the reward for the few players who win. The game doesn’t matter to anyone except the players, so you might as well play the game to win.
There are other possible motivations too, of course.
But I dunno. I actually think that a lot of speculation about the motivations of scientific fraudsters (includiing my own speculation) is a matter of non-fraudsters trying to imagine what a comprehensible motivation for fraud might be.
Pingback: Links 9/4/2021 | naked capitalism